Lexpredict Lexnlp Versions Save

LexNLP by LexPredict

corpus/arxiv-abstracts-with-agreement/0.1

1 year ago

ArXiv Dataset Abstract Subsample

Only abstracts containing the substring "agreement"

Date (ISO 8601): 2022-04-15

This dataset contains 69,411 plaintext files, each corresponding to an ArXiv document abstract. Each abstract contains at least one appearance of the substring "agreement".

Each text file in this dataset contains the text of an abstract extracted from the full JSON Lines-formatted dataset (described below). Each file is named after its ArXiv ID and has been given the .txt file extension. In the case where the ArXiv ID contained a forwardslash (/), the forwardslash was replaced with an underscore (_). The text files have a median length of 1057 characters and a mean length of 1100 characters.

The full ArXiv metadata dataset can be found on Kaggle and includes additional information alongside each abstract, such as document authors, comments, DOI, etc. The original dataset was distributed under the CC0: Public Domain license, thereby permitting this modification and redistribution.

corpus/atticus-cuad-v1-plaintext/0.1

1 year ago

The Atticus Project: CUAD v1 Dataset (plaintext only)

Date (ISO 8601): 2022-04-16

This is a partial redistribution of The Atticus Project's CUAD v1 dataset of 510 labeled contracts.

Unlike in the original dataset, the plaintext documents have been organized into their respective contract type categories.

The original dataset is licensed under CC BY 4.0

Notes:

  • The file ADUROBIOTECH,INC_06_02_2020-EX-10.7-CONSULTING AGREEMENT.txt is duplicated as ADUROBIOTECH,INC_06_02_2020-EX-10.7-CONSULTING AGREEMENT(1).txt in both this redistribution and the original dataset.
  • In the original dataset, the file HarpoonTherapeuticsInc_20200312_10-K_EX-10.18_12051356_EX-10.18_Development Agreement.txt has a corresponding PDF named HarpoonTherapeuticsInc_20200312_10-K_EX-10.18_12051356_EX-10.18_Development Agreement_Option Agreement.pdf
  • In the original dataset, the file NETGEAR,INC_04_21_2003-EX-10.16-AMENDMENT TO THE DISTRIBUTOR AGREEMENT BETWEEN INGRAM MICRO AND NETGEAR.txt has a corresponding PDF named NETGEAR,INC_04_21_2003-EX-10.16-AMENDMENT TO THE DISTRIBUTOR AGREEMENT BETWEEN INGRAM MICRO AND NETGEAR-.pdf

corpus/bonds/0.1

1 year ago

corpus/caselaw-access-project-ark-ill-nc-nm-subset-144million-characters/0.1

1 year ago

Caselaw Access Project

Randomly-selected subset

Date (ISO 8601): 2022-04-15

This dataset is a partial redistribution of the case_text_open data available from the Caselaw Access Project.

Specifically, this dataset contains a subset of the files from the original Caselaw Access Project dataset. These files were randomly drawn from the original data until the subset reached a sum ~144 million characters, not including newlines or spaces. This was done in order to approximately match the character length of a different dataset.

Permission to redistribute is implicitly included on Caselaw Access Project's "About" page, under Usage & access:

Thus far, Illinois, Arkansas, New Mexico, and North Carolina have made this important and positive shift and, as a result, all historical cases from these jurisdictions are freely available to the public without restriction.

This data was downloaded from the Caselaw Access Project in April 2021.

corpus/contract-types/0.1

1 year ago

This dataset contains 2387 text files from SEC EDGAR, each with "agreement" in its file name. The documents have been sorted into the following categories:

  • ADVISORY AGREEMENT
  • AGENCY AGREEMENT
  • ARBITRATION AGREEMENT
  • ASSIGNMENT AGREEMENT
  • ASSUMPTION AGREEMENT
  • COLLABORATION AGREEMENT
  • CONFIDENTIALITY AGREEMENT
  • CONTRIBUTION AGREEMENT
  • DEALER AGREEMENT
  • DEPOSIT AGREEMENT
  • DEVELOPMENT AGREEMENT
  • DISTRIBUTION AGREEMENT
  • EMPLOYMENT AGREEMENT
  • ENTITY STRUCTURE
  • ESCROW AGREEMENT
  • EXCHANGE AGREEMENT
  • FEE WAIVER AGREEMENT
  • FRANCHISE AGREEMENT
  • FUND ACCOUNTING AGREEMENT
  • INDEMNIFICATION AGREEMENT
  • INTERCREDITOR AGREEMENT
  • INVESTMENT AGREEMENT
  • JOINT FILING AGREEMENT
  • LEASE AGREEMENT
  • LICENSE AGREEMENT
  • LOAN AGREEMENT
  • MANAGEMENT AGREEMENT
  • MANUFACTURING AGREEMENT
  • MERGER & ACQUISITION AGREEMENT
  • NON-DISCLOSURE AGREEMENT
  • NOT A CONTRACT
  • OPERATING AGREEMENT
  • OTHER CONTRACT
  • PLEDGE AGREEMENT
  • PROMISSORY NOTE
  • REGISTRATION RIGHTS AGREEMENT
  • REPURCHASE AGREEMENTS
  • SALES CONTRACT
  • SECURITIES SALES
  • SECURITY AGREEMENT
  • SERVICES AGREEMENT
  • SERVICING AGREEMENT
  • SETTLEMENT AGREEMENT
  • STOCK OPTION AGREEMENT
  • SUBORDINATION AGREEMENT
  • SUPPLY AGREEMENT
  • TAX ALLOCATION AGREEMENT
  • TRUST AGREEMENT
  • UNDERWRITING AGREEMENT
  • WAIVER AGREEMENT
  • WARRANT AGREEMENT

corpus/eurlex-sample-10000/0.1

1 year ago

EUR-Lex Document Sample (10,000)

Date (ISO 8601): 2022-04-16

This dataset contains 10,000 EUR-Lex documents downloaded via http://api.epdb.eu/.

  • 5,000 of these documents do contain at least one appearance of the substring "agreement" (case insensitive).
  • 5,000 of these documents do not contain a single appearance the substring "agreement" (case insensitive).

Important excerpts from EUR-Lex's copyright notice are quoted below:

The Commission’s document reuse policy is based on Decision 2011/833/EU. Unless otherwise specified, you can re-use the legal documents published in EUR-Lex for commercial or non-commercial purposes.

The copyright for the editorial content of this website, the summaries of EU legislation and the consolidated texts, which is owned by the EU, is licensed under the Creative Commons Attribution 4.0 International licence​​.

corpus/govinfo-fr-2021/0.1

1 year ago

GovInfo Federal Register (2021)

Date (ISO 8601): 2022-04-11

Extracted from: https://www.govinfo.gov/bulkdata/FR/2021

Converted to text using Apache Tika.

corpus/uspto-sample/0.1

1 year ago

United States Patent and Trademark Office (USPTO) Dataset

Date (ISO 8601): 2022-04-11

The USPTO backgrounds were downloaded using a derivative of this script: https://github.com/EleutherAI/pile-uspto

This sample contains 4500 text files distributed evenly into 45 directories. Each text file contains the text of a USPTO application background and has been placed into the directory respectively representing the grant's year of issue. These texts were randomly selected from a subset of all backgrounds two thousand or more characters in length.

corpus/sec-edgar-forms-3-4-5-8k-10k-sample/0.1

1 year ago

SEC EDGAR Forms 3, 4, 5, 8-K, 10-K

Date (ISO 8601): 2022-04-19

A sample of SEC EDGAR forms from OpenEDGAR stored in plaintext.

Form Count
3 198
4 198
5 200
8-K 197
10-K 199

pipeline/contract-type/0.1

1 year ago

Scikit-Learn Pipeline


Name Class State
transformerpreprocessor TransformerPreprocessor head_character_n=0, normalizer=<lexnlp.ml.normalizers.Normalizer object>
transformervectorizer TransformerVectorizer vectorizers=(<lexnlp.ml.vectorizers.VectorizerDoc2Vec object>, <lexnlp.ml.vectorizers.VectorizerKeywordSearch object>)
minmaxscaler MinMaxScaler feature_range=(-1.0, 1.0)
logisticregressioncv LogisticRegressionCV

Training Data

Dataset Description Hyperlink
corpus/contract-types/0.1 A sample of labeled contract types obtained from SEC EDGAR https://www.sec.gov/edgar.shtml
corpus/atticus-cuad-v1-plaintext/0.1 Atticus CUAD v1 contracts https://www.atticusprojectai.org/cuad

Metrics

LOGISTICREGRESSIONCV
                               precision    recall  f1-score   support

           ADVISORY AGREEMENT       0.64      0.69      0.67        13
          AFFILIATE AGREEMENT       0.67      1.00      0.80         2
             AGENCY AGREEMENT       0.82      0.64      0.72        14
        ARBITRATION AGREEMENT       1.00      1.00      1.00         1
         ASSIGNMENT AGREEMENT       0.25      0.40      0.31         5
         ASSUMPTION AGREEMENT       0.33      0.30      0.32        10
      COLLABORATION AGREEMENT       0.53      0.59      0.56        17
    CONFIDENTIALITY AGREEMENT       0.67      0.91      0.77        11
       CONTRIBUTION AGREEMENT       0.85      0.79      0.81        14
        CO_BRANDING AGREEMENT       0.67      0.50      0.57         4
             DEALER AGREEMENT       1.00      1.00      1.00        13
            DEPOSIT AGREEMENT       0.71      1.00      0.83        10
        DEVELOPMENT AGREEMENT       0.44      0.44      0.44        18
       DISTRIBUTION AGREEMENT       0.67      0.67      0.67        18
         EMPLOYMENT AGREEMENT       0.82      0.71      0.76        65
        ENDORSEMENT AGREEMENT       0.80      0.80      0.80         5
   ENTITY STRUCTURE AGREEMENT       0.00      0.00      0.00         3
             ESCROW AGREEMENT       0.90      0.75      0.82        12
           EXCHANGE AGREEMENT       1.00      0.85      0.92        13
          FRANCHISE AGREEMENT       0.93      0.87      0.90        15
            HOSTING AGREEMENT       0.50      0.75      0.60         4
    INDEMNIFICATION AGREEMENT       0.91      0.91      0.91        11
      INTERCREDITOR AGREEMENT       0.89      0.81      0.85        21
         INVESTMENT AGREEMENT       0.50      0.50      0.50         6
                 IP AGREEMENT       1.00      0.67      0.80         3
       JOINT FILING AGREEMENT       0.50      0.67      0.57         3
      JOINT VENTURE AGREEMENT       0.00      0.00      0.00         2
              LEASE AGREEMENT       0.67      0.67      0.67         3
            LICENSE AGREEMENT       0.50      0.40      0.44        10
               LOAN AGREEMENT       0.74      0.52      0.61        27
        MAINTENANCE AGREEMENT       0.44      0.57      0.50         7
         MANAGEMENT AGREEMENT       0.50      1.00      0.67         3
      MANUFACTURING AGREEMENT       0.29      0.50      0.36         4
          MARKETING AGREEMENT       0.33      0.33      0.33         3
MERGER & ACQUISTION AGREEMENT       0.67      0.56      0.61        18
     NON-DISCLOSURE AGREEMENT       0.56      0.71      0.63         7
     NOT A CONTRACT AGREEMENT       0.63      0.83      0.72        23
     OTHER CONTRACT AGREEMENT       0.00      0.00      0.00         3
        OUTSOURCING AGREEMENT       0.00      0.00      0.00         4
          PROMOTION AGREEMENT       0.00      0.00      0.00         2
REGISTRATION RIGHTS AGREEMENT       0.40      1.00      0.57         2
           RESELLER AGREEMENT       0.00      0.00      0.00         2
     SALES CONTRACT AGREEMENT       0.56      0.50      0.53        10
   SECURITIES SALES AGREEMENT       0.00      0.00      0.00         2
           SECURITY AGREEMENT       0.50      0.40      0.44         5
           SERVICES AGREEMENT       0.50      0.46      0.48        13
          SERVICING AGREEMENT       0.67      0.67      0.67         3
         SETTLEMENT AGREEMENT       0.57      0.67      0.62        12
        SPONSORSHIP AGREEMENT       1.00      0.83      0.91         6
       STOCK OPTION AGREEMENT       0.56      0.79      0.66        24
 STRATEGIC ALLIANCE AGREEMENT       1.00      1.00      1.00         6
      SUBORDINATION AGREEMENT       0.57      0.67      0.62         6
             SUPPLY AGREEMENT       0.43      0.43      0.43         7
     TAX ALLOCATION AGREEMENT       0.88      1.00      0.93         7
     TRANSPORTATION AGREEMENT       1.00      0.67      0.80         3
              TRUST AGREEMENT       0.00      0.00      0.00         3
       UNDERWRITING AGREEMENT       1.00      0.88      0.93         8
             WAIVER AGREEMENT       0.72      0.87      0.79        15
            WARRANT AGREEMENT       0.80      0.86      0.83        14

                     accuracy                           0.68       575
                    macro avg       0.58      0.61      0.59       575
                 weighted avg       0.68      0.68      0.67       575


Usage


from lexnlp.extract.en.contracts.predictors import ProbabilityPredictorContractType

with open('pipeline_contract_type_classifier.cloudpickle', 'rb') as f:
    pipeline_contract_type_classifier: Pipeline = cloudpickle.load(f)

probability_predictor_contract_type: ProbabilityPredictorContractType = \
    ProbabilityPredictorContractType(pipeline=pipeline_contract_type_classifier)

probability_predictor_contract_type.detect_contract_type(
    text=['This is a sentence.', 'LICENSE AGREEMENT', 'The owner shall be responsible for the license of this software.'],
    min_probability=0.5,
)