PatCit Versions Save

Making Patent Citations Uncool Again

0.3.1

3 years ago

Data

Major improvement of intext.patent

Validation

Validation of the intext.patent table

Thanks

Special thanks to:

Gabriele Cristelli (EPFL) Kyle Higham (Hitotsubashi University) Lucas Violon (HEC Paris)

0.3.0

3 years ago

🏷 v0.3.0

Data

  • Major improvement of bibliographical_reference schema (harmonize grobid & crossref) for seamless analysis
  • Enrichment of intext.patent
  • Add domain specific front page tables (norm_standard, database, wiki)

Community

  • Revisit BQ project architecture
  • Add Colab notebooks integration
  • Revisit README.md

Code

  • Lighter API
  • Lighter dependencies

Models

  • Add information extraction models
  • Add models and training data DVC support

Validation

  • Validation of in-text extraction models

Thanks

Special thanks to:

  • Gabriele Cristelli (EPFL)
  • Kyle Higham (Hitotsubashi University)
  • Lucas Violon (HEC Paris)

v0.2-npl

4 years ago

🏷 v0.2-npl

The v0.2-npl introduces 2 major improvements:

  • npl_class field. This field is predicted using a multi-class text classification model based on spaCy textCategorizer with the npl text as input. See focus and models binaries below.
  • Propagate ISSN using title_j to bibliographical references with the same title_j but no match.

Focus on npl_class

en_core_web_sm_npl-class-ensemble-0.8

  • ensemble model (bow+cnn with bagging)
  • trained on 80% of the "gold" dataset and evaluated on remaining 20% (hold-out)

See in models/npl_class_training/ for more

Average performance

accuracy precision recall f1
0.9 0.89 0.88 0.88

Class performance

precision recall f1 support
BIBLIOGRAPHICAL_REFERENCE 0.92 0.95 0.93
SEARCH_REPORT 1.0 0.92 0.96
OFFICE_ACTION 0.99 0.93 0.96
DATABASE 0.89 0.73 0.8
WEBPAGE 0.53 0.53 0.53
PATENT 0.91 0.94 0.93
NA 1.0 1.0 1.0
PRODUCT_DOCUMENTATION 0.44 0.43 0.44
NORM_STANDARD 0.86 0.6 0.71
LITIGATION 0.25 0.11 0.15

en_core_web_sm_npl-class-ensemble-1.0

Same as en_core_web_sm_npl-class-ensemble-1.0 but trained on full dataset to maximize performance. Model used to create the npl_class field.