:snake: :alembic: Optical Character Recognition using tesseract within Frappe.
:alembic: Experimental Frappe OCR application with tesseract.
This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.
Check out more on ERPNext Discuss.
See CHANGELOG
See Taiga.io
Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:
sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript
bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr
When installing Frappe app, the following python requirements will be installed:
python binding for tesseract, tesserocr
image processing library in python, pillow
HTTP library in python, requests
python binding for imagemagick, wand
File Being Read:
Sample Screenshot:
In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
If you wish to develop or just test locally this application, you can use docker-compose up -d
at the root of the this repository.
You can then access your ERPNext OCR dev env at http://localhost:8080
.
wand.exceptions.PolicyError: not authorized '/opt/sample.pdf' @ error/constitute.c/ReadImage/412
This can happen due to security configuration in imagemagick preventing it to read PDF files.
Reference:
wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.
This might happen if you're missing a dependency to convert PDF, most of the time ghostscript
References:
OSError: encoder error -2 when writing image file
Fax3SetupState: Bits/sample must be 1 for Group 3/4 encoding/decoding.
that usually happens when TIFF image compression is not valid / recognized.bench run-tests --app erpnext_ocr
Monogramm
John Vincent Fiel
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Check the contributing guide.
Give a :star: if this project helped you!
Copyright © 2019 Monogramm.
This project is MIT licensed.
This README was generated with :heart: by readme-md-generator