Paperetl Versions Save

📄 ⚙️ ETL processes for medical and scientific papers

v2.2.1

8 months ago

This release adds the following enhancements and bug fixes:

  • Update setup.py to only show standard image on PyPI (#48)

v2.2.0

8 months ago

This release adds the following enhancements and bug fixes:

  • Add example notebook (#43)
  • Update CORD-19 scripts (#44)
  • Update minimum Python version to 3.8 (#47)

v2.1.0

1 year ago

This release adds the following enhancements and bug fixes:

  • Issue processing into Elasticsearch (#41)
  • Improve PMB filtering logic (#42)

v2.0.0

2 years ago

This release adds the following enhancements and bug fixes:

  • Add PubMed as source (#16)
  • Add arXiv as source (#17)
  • Detect month changes in CORD-19 entry date process (#33)
  • Remove study attribute and design models and all related dependencies (#34)
  • Add pre-commit checks (#35)
  • Remove legacy merge logic (#36)
  • Add database flag to determine if database should be replaced (#37)
  • Add multiprocessing support to files process (#38)
  • Support reading compressed files (#39)
  • Require Python 3.7+ (#40)

v1.6.0

3 years ago

This release adds the following enhancements and bug fixes:

  • Improve sample size extraction (#29)
  • Add generic CSV source (#30)
  • Add common method for accessing Grammar object (#31)
  • Update CORD-19 entry dates source (#32)
  • Limit docker and setup.py to spaCy 2.x until attribute/design models rebuilt

v1.5.0

3 years ago

This release adds the following enhancements and bug fixes:

  • Add dockerfile for building paperetl environment (#9)
  • Add component to build entry-dates.csv (#18)
  • Add pre-trained study design models to GitHub (#19)
  • Update README to correct and improve documentation (#20)
  • Ensure length of sections is less than max nlp length (#27)

v1.4.0

3 years ago

This release adds the following enhancements and bug fixes:

  • Handle PDF parsing exceptions (#22)
  • Increase test coverage (#23)
  • Modify merge method to handle no update merges (#24)
  • Fix bug with JSON export (#25)
  • Fix bug with study model training (#26)

v1.3.0

3 years ago

This release adds the following enhancements and bug fixes:

  • Add file name as source for file process (#12)
  • Use XML id for file figure processing (#13)
  • Filter duplicate ids (#14)
  • Build test suite (#15)

v1.2.0

3 years ago

This release adds the following enhancements:

  • Support recursive directory processing (#7)
  • Improve publication date parsing (#8)
  • Added incremental database updates (#10)
  • Remove citations (#11)

v1.1.1

3 years ago

Minor README update to note package can be installed from PyPI