Making Patent Citations Uncool Again
v0.3.0
bibliographical_reference
schema (harmonize grobid & crossref) for seamless analysisintext.patent
norm_standard
, database
, wiki
)README.md
Special thanks to:
v0.2-npl
The v0.2-npl
introduces 2 major improvements:
npl_class
field. This field is predicted using a multi-class text classification model based on spaCy textCategorizer with the npl text as input. See focus and models binaries below.ISSN
using title_j
to bibliographical references with the same title_j
but no match.npl_class
en_core_web_sm_npl-class-ensemble-0.8
ensemble
model (bow+cnn with bagging)See in models/npl_class_training/ for more
accuracy | precision | recall | f1 |
---|---|---|---|
0.9 | 0.89 | 0.88 | 0.88 |
precision | recall | f1 | support |
---|---|---|---|
BIBLIOGRAPHICAL_REFERENCE | 0.92 | 0.95 | 0.93 |
SEARCH_REPORT | 1.0 | 0.92 | 0.96 |
OFFICE_ACTION | 0.99 | 0.93 | 0.96 |
DATABASE | 0.89 | 0.73 | 0.8 |
WEBPAGE | 0.53 | 0.53 | 0.53 |
PATENT | 0.91 | 0.94 | 0.93 |
NA | 1.0 | 1.0 | 1.0 |
PRODUCT_DOCUMENTATION | 0.44 | 0.43 | 0.44 |
NORM_STANDARD | 0.86 | 0.6 | 0.71 |
LITIGATION | 0.25 | 0.11 | 0.15 |
en_core_web_sm_npl-class-ensemble-1.0
Same as en_core_web_sm_npl-class-ensemble-1.0
but trained on full dataset to maximize performance. Model used to create the npl_class
field.