Tesseract.js Versions Save

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

v4.0.6

11 months ago

What's Changed

  • Invalid langData (.traineddata files) are now cleared from cache (#753)
    • Note: setting cacheMethod: 'none' or cacheMethod: 'refresh' to prevent invalid files from being cached should no longer be necessary
  • Added source maps to esm build (#761)
  • Various updates to documentation

Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.5...v4.0.6

v4.0.5

1 year ago

What's Changed

  • No changes to code
    • Removed unnecessary files to reduce the size of the npm package

Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.4...v4.0.5

v4.0.4

1 year ago

What's Changed

  • Added SIMD-detection when corePath is manually specified (#735)
    • Important note for users who set corePath: for significantly faster performance, set corePath to a directory that includes both tesseract-core.wasm.js and tesseract-core-simd.wasm.js
    • See this comment for explanation
  • Improved auto-rotate feature (rotateAuto: true) (#747)
  • Switched default CDN from unpkg to jsdelivr (#743)
  • Updated various dependencies (#729, #736, #737, #739, #741)
  • Reduced size of npm package (#731, #734, #740)

New Contributors

Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.3...v4.0.4

v4.0.3

1 year ago

What's Changed

  • Updated Tesseract to v5.3.0
    • This resolves bug with inverted (white on black) text recognition (#717)
  • Minor documentation fixes (#612, #614, #682, #673)
  • Better types for addJob by @nathanbabcock in https://github.com/naptha/tesseract.js/pull/719

New Contributors

Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.2...v4.0.3

v4.0.2

1 year ago

What's Changed

  • Fixed bug breaking compatibility with certain devices (#701)

Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.1...v4.0.2

v4.0.1

1 year ago

What's Changed

  • Running recognize or detect with invalid image argument now throws error message (#699)
  • Fixed bug with custom langdata paths (#697)

New Contributors

Full Changelog: https://github.com/naptha/tesseract.js/compare/v4.0.0...v4.0.1

v4.0.0

1 year ago

Breaking Changes

  1. createWorker is now async
    1. In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
    2. Calling with invalid workerPath or corePath now produces error/rejected promise (#654)
  2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
  3. getPDF function replaced by pdf recognize option (#488)
    1. This allows PDFs to be created when using a scheduler
    2. See browser and node examples for usage

Major New Features

  1. Processed images created by Tesseract can be retrieved using imageColor, imageGrey, and imageBinary options (#588)
    1. See image-processing.html example for usage
  2. Image rotation options rotateAuto and rotateRadians have been added, which significantly improve accuracy on certain documents
    1. See Issue #648 example of how auto-rotation improves accuracy
    2. See image-processing.html example for usage of rotateAuto option
  3. Tesseract parameters (usually set using worker.setParameters) can now be set for single jobs using worker.recognize options (#665)
    1. For example, a single job can be set to recognize only numbers using worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
    2. As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
  4. Initialization parameters (e.g. load_system_dawg, load_number_dawg, and load_punc_dawg) can now be set (#613)
    1. The third argument to worker.initialize now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file
    2. For example, both of these lines set load_number_dawg to 0:
      1. worker.initialize('eng', "0", {load_number_dawg: "0"});
      2. worker.initialize('eng', "0", "load_number_dawg 0");

Other Changes

  1. loadLanguage now resolves without error when language is loaded but writing to cache fails
    1. This allows for running in Firefox incognito mode using default settings (#609)
  2. detect returns null values when OS detection fails rather than throwing error (#526)
  3. Memory leak causing crashes fixed (#678)
  4. Cache corruption should now be much less common (#666)

New Contributors

Full Changelog: https://github.com/naptha/tesseract.js/compare/v3.0.3...v4.0.0

v3.0.3

1 year ago

What's Changed

  • Invalid language data now throws error at initialize step (#602)
  • Recognition progress logging fixed (#655)
  • Minor changes to types, documentation

Full Changelog: https://github.com/naptha/tesseract.js/compare/v3.0.2...v3.0.3

v3.0.2

1 year ago

What's Changed

New Contributors

Full Changelog: https://github.com/naptha/tesseract.js/compare/v2.1.5...v3.0.2

v2.1.5

2 years ago
  • Add language constants (thanks to @stonefruit )
  • Add user job id to logger (thanks to @miguelm3)
  • Fix env selection bug in electron (thanks to @LoginovIlya)