Grobid Versions Save

A machine learning software for extracting information from scholarly documents

0.5.4

5 years ago

Changes:

  • transparent usage of DeLFT deep learning models (usual BidLSTM-CRF) instead of Wapiti CRF models, native integration via JEP

  • support of biblio-glutton as DOI/metadata matching service, alternative to crossref REST API

  • improvement of citation context identification and matching (+9% recall with similar precision, for PMC sample 1943 articles, from 43.35 correct citation contexts per article to 49.98 correct citation contexts per article)

  • citation callout now in abstract, figure and table captions

  • structured abstract (including update of TEI schema)

  • bug fixes and some more parameters: by default using all available threads when training and possibility to load models at the start of the service

0.5.3

5 years ago

Changes:

  • Improvement of consolidation options and processing (better handling of CrossRef API, but the best is coming soon ;)
  • Better recall for figure and table identification (thanks to @detonator413)
  • Support of proxy for calling crossref with Apache HttpClient
  • Minor bugfixing

0.5.2

5 years ago

Changes:

  • Corrected back status codes from the REST API when no available engine (503 is back again to inform the client to wait, it was removed by error in version 0.5.0 and 0.5.1 for PDF processing services only, see documentation of the REST API)
  • Added metrics in the REST entrypoint (accessible via http://localhost:8071)
  • Added Grobid clients for Java, Python and NodeJS
  • Added counters for consolidation tasks and consolidation results
  • Add case sensitiveness option in lexicon/FastMatcher
  • Updated documentation
  • Bugfixing: #339, #322, #300, and other

0.5.1

6 years ago

Bug fixes

0.5.0

6 years ago

The latest stable release of GROBID is version 0.5.0. As compared to previous version 0.4.3, this version brings:

  • Migrate from maven to gradle for faster, more flexible and more stable build, release, etc.
  • Usage of Dropwizard for web services
  • Move the Grobid service manual to readthedocs
  • (thanks to @detonator413 and @lfoppiano for this release! future work in versions 0.5.* will focus again on improving PDF parsing and structuring accuracy)

grobid-parent-0.4.4

6 years ago

Fixed issue that was making the release build not working

grobid-parent-0.4.3

6 years ago

The latest stable release of GROBID is version 0.4.3. As compared to previous version 0.4.2, this version brings:

  • New models: f-score improvement on the PubMed Central sample, bibliographical references +2.5%, header +7%
  • New training data and features for bibliographical references, in particular for covering HEP domain (INSPIRE), arXiv identifier, DOI and url (thanks @iorala and @michamos !)
  • Support for CrossRef REST API (instead of the slow OpenURL-style API which requires a CrossRef account), in particular for multithreading usage (thanks @Vi-dot)
  • Improve training data generation and documentation (thanks @jfix)
  • Unicode normalisation and more robust body extraction (thanks @aoboturov)
  • fixes, tests, documentation and update of the pdf2xml fork for Windows (thanks @lfoppiano)

grobid-parent-0.4.2

6 years ago

Versions 0.4.2 of GROBID

grobid-parent-0.4.1

7 years ago

grobid-parent-0.3.9

8 years ago

Latest stable version for versions 0.3.* of GROBID