Mindee Doctr Versions Save

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

v0.2.1

3 years ago

This patch release fixes issues with preprocessor and greatly improves text detection models.

Brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.

Highlights

Improved text detection

With this iteration, DocTR brings you a set of newly pretrained parameters for db_resnet50 which was trained using a much wider range of data augmentations!

architecture	FUNSD recall	FUNSD precision	CORD recall	CORD precision
db_resnet50 + crnn_vgg16_bn (v0.2.0)	64.8	70.3	67.7	78.4
db_resnet50 + crnn_vgg16_bn (v0.2.1)	70.08	74.77	82.19	79.67

OCR sample

Sequence prediction confidence

Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Using the following image: reco_sample

with this snippet

from doctr.documents import DocumentFile
from doctr.models import recognition_predictor
predictor = recognition_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/reco_sample.jpg")
print(predictor(doc))

will get you a list of tuples (word value, sequence confidence):

[('invite', 0.9302278757095337)]

More comprehensive representation of predictors

For those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.

The following snippet:

from doctr.models import ocr_predictor
print(ocr_predictor())

now yields a much cleaner representation of the predictor composition

OCRPredictor(
  (det_predictor): DetectionPredictor(
    (pre_processor): PreProcessor(
      (resize): Resize(output_size=(1024, 1024), method='bilinear')
      (normalize): Compose(
        (transforms): [
          LambdaTransformation(),
          Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),
        ]
      )
    )
    (model): DBNet(
      (feat_extractor): IntermediateLayerGetter()
      (fpn): FeaturePyramidNetwork(channels=128)
      (probability_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>
      (threshold_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>
      (postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)
    )
  )
  (reco_predictor): RecognitionPredictor(
    (pre_processor): PreProcessor(
      (resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)
      (normalize): Compose(
        (transforms): [
          LambdaTransformation(),
          Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),
        ]
      )
    )
    (model): CRNN(
      (feat_extractor): <doctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>
      (decoder): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>
      (postprocessor): CTCPostProcessor(vocab_size=118)
    )
  )
  (doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)
)

Breaking changes

Metrics' granularity

Renamed ExactMatch to TextMatch since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.

0.2.0	0.2.1
`>>> from doctr.utils.metrics import ExactMatch` `>>> metric = ExactMatch(ignore_case=True)` `>>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])` `>>> print(metric.summary())` `0.75`	`>>> from doctr.utils.metrics import TextMatch` `>>> metric = TextMatch()` `>>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])` `>>> print(metric.summary())` `{'raw': 0.5, 'caseless': 0.75, 'unidecode': 0.5, 'unicase': 0.75}`

Raw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.

New features

Models

Deep learning model building and inference

Added detection features of faces (#258), bar codes (#260)
Added new pretrained weights for db_resnet50 (#277)
Added sequence probability in text recognition (#284)

Utils

Utility features relevant to the library use cases.

Added granularity on recognition metrics (#274)
Added visualization option to display artefacts (#273)

Transforms

Data transformations operations

Added option to switch padding between symmetric and left for resizing while preserving aspect ratio (#277)

Test

Verifications of the package well-being before release

added unittests for artefact detection (#258, #260)
added detailed unittests for granular metrics (#274)
Extended unittests for resizing (#277)

Documentation

Online resources for potential users

Added installation instructions for Mac & Windows users (#268)
Added benchmark of models on private datasets (#269)
Added changelog to the documentation (#279)
Added BibTeX citation in README (#279)
Added parameter count in performance benchmarks (#280)
Added OCR illustration in README (#283) and documentation (#285)

References

Reference training scripts

Added support of Weights & biases logging for training scripts (#286)
Added option to start using pretrained models (#286)

Others

Other tools and implementations

Added CI job to build for MacOS & Windows (#268)

Bug fixes

Datasets

Fixed blank image handling in OCRDataset (#270)

Documents

Fixed channel order for PDF render into images (#276)

Models

Fixed normalization step in preprocessors (#277)

Utils

Fixed OCRMetric update edge case (#267)

Transforms

Fixed Resize when preserving aspect ratio (#266)
Fixed RandomSaturation (#277)

Documentation

Fixed documentation of OCRDataset (#274)
Improved documentation of doctr.documents.elements (#274)

References

Fixed resizing in recognition script (#266)

Others

Fixed demo for multi-page examples (#276)
Fixed image decoding in API routes (#282)
Fixed preprocessing in API routes (#282)

Improvements

Datasets

Added file existence check in dataset constructors (#277)
Refactored dataset methods (#278)

Models

Improved DBNet box computation (#272)
Refactored preprocessors using transforms (#277)
Improved repr of preprocessors and models (#277)
Removed ignore_case and ignore_accents from recognition postprocessors (#284)

Documents

Updated performance benchmarks (#272, #277)

Documentation

Updated badges in README & documentation versions (#254)
Updated landing page of documentation (#279, #285)
Updated repo folder description in CONTRIBUTING (#282)
Improved the README's instructions to run the API (#282)

Tests

Improved unittest of resizing transforms (#266)
Improved unittests of OCRMetric (#267)
Improved unittest of PDF rendering (#276)
Extended unittest of OCRDataset (#278)
Updated unittest of DocumentBuilder and recognition models (#284)

References

Updated training scripts (#284)

Others

Updated requirements (#274)
Updated evaluation script (#277, #284)

v0.2.0

3 years ago

This release improves model performances and extends library features considerably (including a minimal API template, new datasets, newly trained models).

Release handled by @fg-mindee & @charlesmindee

Note: doctr 0.2.0 requires TensorFlow 2.4.0 or higher.

Highlights

New pretrained weights

Enjoy our newly trained detection and recognition models with improved robustness and performances! Check our fully benchmark in the documentation for further details.

Improved Line & block detection

This release comes with a large improvement of line detection. While it is only done in post-processing for now, we considered many cases to make sure you get a consistent and helpful result:

Before	After

File reading from any source

You can now expect reading images or PDF from files, binary streams, or even URLs. We completely revamped our document reading pipeline with the new DocumentFile class methods

from doctr.documents import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
# Web page
webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()

If by any chance your PDF is a source file (web page are converted into such PDF) and not a scanned version, you will also be able to read the information inside

from doctr.documents import DocumentFile
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Retrieve bounding box and text information
words = pdf_doc.get_words()

Reference scripts for training

By adding multithreading dataloaders and transformations in DocTR, we can now provide you with reference training scripts to train models on your own!

Text detection script (additional details available in README)

python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20

Text recognition script (additional details available in README)

python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20

Minimal API

If you enjoy DocTR, you might want to integrate it in your API. For your convenience, we added a minimal API template with routes for text detection, text recognition or plain OCR!

Run it as follows in a docker container:

PORT=8050 docker-compose up -d --build

Your API is now running locally on port 8050! Navigate to http://localhost:8050/redoc to check your documentation API doc

Or start making your first request!

import requests
import io
with open('/path/to/your/image.jpeg', 'rb') as f:
    data = f.read()
response = requests.post("http://localhost:8050/recognition", files={'file': io.BytesIO(data)})

Breaking changes

Support dropped for TF < 2.4.0

In order to ensure that all compression features are fully functional in DocTR, support for TensorFlow < 2.4.0 has been dropped.

Less confusing predictor's inputs

OCRPredictor used to be taking a list of documents as input, and now only takes list of pages.

0.1.1	0.2.0
`>>> predictor = ...` `>>> page = np.zeros((h, w, 3), dtype=np.uint8)` `>>> out = predictor([[page]])`	`>>> predictor = ...` `>>> page = np.zeros((h, w, 3), dtype=np.uint8)` `>>> out = predictor([page])`

Model calls

To gain more flexibility on the training side, the model call method was changed to yield a dictionary with multiple entries

0.1.1	0.2.0
`>>> from doctr.models import db_resnet50, DBPostProcessor` `>>> model = db_resnet50(pretrained=True)` `>>> postprocessor = DBPostProcessor()` `>>> prob_map = model(input_t, training=False)` `>>> boxes = postprocessor(prob_map)`	`>>> from doctr.models import db_resnet50` `>>> model = db_resnet50(pretrained=True)` `>>> out = model(input_t, training=False)` `>>> boxes = out['boxes']`

New features

Datasets

Easy-to-use datasets for OCR

Added support of SROIE (#165) and CORD (#197)
Added recognition dataloader (#163)
Added sequence encoding function (#184)
Added DataLoader as a dataset wrapper for parallel high performance data reading (#198, #201)
Added support of OCRDataset (#244)

Documents

Added class methods for flexible file reading (#172)
Added visualization method to Document and Page (#174)
Added support for webpage conversion to document (#221, #222)
Added block detection in documents (#224)

Models

Deep learning model building and inference

Added pretrained weights for crnn_resnet31 recognition model (#160)
Added target building (#162) & loss computation (#171) methods in DBNet
Added loss computation for SAR (#185)
Added LinkNet detection architecture (#191, #200, #202)

Utils

Utility features relevant to the library use cases.

Added reset method to metric objects (#175)

Transforms

Data transformations operations

Added Compose, Resize, Normalize & LambdaTransformation (#205)
Added color transformations (#206, #207, #211)

Test

Verifications of the package well-being before release

Added unittest for preprocessors and DB target building (#162) & loss computation (#171)
Added unittests for dataloaders (#163, #198, #201)
Added unittests for file reading in all input forms and document format (#172, #221, #222, #240)
Added unittests for element display (#174)
Added unittests for metric reset (#175) & IoU computation (#176)
Added unittests for sequence encoding (#184)
Added unittests for loss computation of recognition models (#185, #186)
Added unittests for CORD datasets (#197)
Added unittests for transformations (#205, #206, #207)
Added unittests for OCRDataset (#244)

Documentation

Online resources for potential users

Added performances of crnn_resnet31 (#160)
Added instructions to use Docker in README (#174)
Added references to implemented papers in the README (#191)
Added DataLoader section (#198) as well as transforms (#205, #211) in the documentation
Added FPS in model benchmark section of the documentation (#209)
Added vocab descriptions in documentation (#238)
Added explanations to export models as SavedModel (#246)

References

Added reference training script for recognition (#164) & detection (#178)
Added checkpoint resuming (#210), test-only (#228), backbone freezing (#231), sample display (#249) options
Added data augmentations (#211)

Others

Other tools and implementations

Added localization and end-to-end visualization in demo app (#188)
Added minimal API implementation using FastAPI (#242, #245, #247)
Added CI workflow to run API unittests (#242)

Bug fixes

Datasets

Fixed inplace modifications of boxes in detection dataset (#212)
Fixed box dtype in detection datasets (#226)

Models

Fixed edge case of DBNet target computation (#163, #217)
Fixed edge case of resizing with zero-sized inputs (#177)
Fixed CTC loss computation in CRNN (#195)
Fixed NaN from a rare zero division by adding eps #228)
Fixed LinkNet loss computation (#250)

Utils

Fixed IoU computation with zero-sized inputs (#176, #177)
Fixed localization metric update (#227)

Documentation

Fixed usage instructions in README (#174)

References

Fixed resizing (#194) and validation transforms (#214) in recognition script
Fixed dataset args in training scripts (#195)
Fixed tensor scaling in recognition training (#241)
Fixed validation loop (#250)

Others

Fixed pypi publishing CI job (#159)
Fixed typo in evaluation script (#177)
Fixed demo app inference of detection model (#219)

Improvements

Datasets

Refactored dataloaders (#193)

Models

Refactored task predictors (#169)
Harmonized preprocessor call method (#162)
Switched input type of OCRPredictor from list of docs to list of pages (#170)
Added a backbone submodule with all corresponding models (#187) and refactored recognition models
Added improved pretrained weights of DBNet (#196)
Moved box score computation to core detection postprocessor (#203)
Refactored loss computation for detection models (#208)
Improved line detection in DocumentBuilder (#220)
Made detection postprocessing more robust to image size (#230)
Moved post processors inside the model to have a more flexible call (#248, #250)

Documents

Improved error for image reading when the file cannot be found (#229)
Increased default DPI for PDF to image rendering (#240)
Improved speed of PDF to image conversion (#251)

Utils

Made page display size dynamic while preserving aspect ratio (#173)
Improved visualization size resolution (#174)

Documentation

Added hyperlinks for license and CONTRIBUTING in the README (#169)
Enlarged column width in documentation (#169)
Added visualization script GIF in README (#173)
Revamped README and documentation (#182)
Rearranged model benchmark tables in documentation (#196, #199)
Improved documentation landing page (#239)

Tests

Added more thorough test cases for vision datasets (#165)
Refactored loader unittests (#193)
Added unittest for edge case in metric computation (#227)

References

Added preprocessing in training scripts (#180)
Added recognition loss computation (#189)
Added resize transformations (#205) in scripts
Added proper console metric logging (#208, #210)
Added dataset information console print (#228)

Others

Added version index override possibility for setup (#159)
Enabled TF gpu growth in demo & scripts (#179)
Added support of images and model selection in demo app (#183)
Improved PDF resizing in demo app & eval script (#237)
Dropped support for TF < 2.4 (#243)

v0.1.1

3 years ago

This release patch fixes several bugs, introduces OCR datasets and improves model performances.

Release handled by @fg-mindee & @charlesmindee

Note: doctr 0.1.1 requires TensorFlow 2.3.0 or higher.

Highlights

Introduction of vision datasets

Whether this is for training or evaluation purposes, DocTR provides you with objects to easily download and manipulate datasets. Access OCR datasets within a few lines of code:

from doctr.datasets import FUNSD
train_set = FUNSD(train=True, download=True)
img, target = train_set[0]

Model evaluation

While DocTR 0.1.0 gave you access to pretrained models, you had no way to find the performances of these models apart from computing them yourselves. As of now, we have added a performance benchmark in our documentation for all our models and made the evaluation script available for seamless reproducibility:

python scripts/evaluate.py ocr_db_crnn_vgg

Demo app

Since we want to make DocTR a convenience for you to build OCR-related applications and services, we made a minimal Streamlit demo app to showcase its text detection capabilities. You can run the demo with the following commands:

streamlit run demo/app.py

Here is how it renders performing text detection on a sample document: doctr_demo

Breaking changes

Metric update & summary

For improved clarity, the evaluation metrics' methods were renamed.

0.1.0	0.1.1
`>>> from doctr.utils import ExactMatch` `>>> metric = ExactMatch()` `>>> metric.update_state(['Hello', 'world'], ['hello', 'world'])` `>>> metric.result()`	`>>> from doctr.utils import ExactMatch` `>>> metric = ExactMatch()` `>>> metric.update(['Hello', 'world'], ['hello', 'world'])` `>>> metric.summary()`

Renaming of high-level predictors

As the range of backbones and combinations evolves, we have updated the name of high-level predictors:

0.1.0	0.1.1
`>>> from doctr.models import ocr_db_crnn`	`>>> from doctr.models import ocr_db_crnn_vgg`

New features

Datasets

Easy-to-use datasets for OCR

Added predefined vocabs (#116)
Added string encoding/decoding utilities (#116)
Added FUNSD dataset (#136, #141)

Models

Deep learning model building and inference

Added ResNet-31 backbone to SAR (#132) and CRNN (#148)

Utils

Utility features relevant to the library use cases.

Added localization (#117) & end-to-end OCR (#122, #141) metrics

Test

Verifications of the package well-being before release

Added unittests for evaluation metrics (#117, #122)
Added unittests for string encoding/decoding (#116)
Added unittests for datasets (#136, #141)
Added unittests for pretrained crnn_resnet31 (#148), and OCR predictors (#150)

Documentation

Online resources for potential users

Added pypi badge to README (#114)
Added pypi installation instructions to documentation (#114)
Added evaluation metric section (#117, #122, #158)
Added multi-version documentation deployment (#123)
Added datasets page in documentation (#136, #154)
Added performance benchmark on FUNSD in documentation (#143, #149, #150, #155)
Added instructions in README to run the demo app (#146)
Added sar_resnet31 to recognition models documentation (#150)

Others

Other tools and implementations

Added default label to bug report issues (#121)
Updated CI job for documentation build (#123)
Added CI job to ensure analyze.py script runs (#142)
Added evaluation script (#141, #145, #151)
Added text detection demo app (#146)

Bug fixes

Models

Fixed no-detection predictor export (#119)
Fixed edge case of polygon to box computation (#139)
Fixed DB bitmap_to_boxes method (#155)

Utils

Fixed typo in ExactMatch (#120)
Fixed IoU computation when boxes are distant (#140)

Test

Documentation

Fixed docstring examples of predictors (#126)
Fixed multi-version documentation build (#138)
Fixed docstrings of VisionDataset and FUNSD (#147)
Fixed usage instructions in README (#150)
Fixed installation instructions in documentation (#154)

Others

Fixed pypi release CI job (#153)

Improvements

Models

Added dimension check on predictor's inputs (#126)
Updated pretrained DBNet URLs (#129, #150)
Improved DBNet post-processing (#130, #150, #155, #157)
Moved normalization parameters to config (#133, #150)
Refactored file downloading (#136)
Increased default batch size for recognition (#143)
Updated max_length and input_shape of SAR (#143)
Added support of absolute coordinates for crop extraction (#145)
Added proper kernel sizing to silence TF unresolved checkpoints warnings (#152, #156)

Utils

Renamed state updating and summarizing methods of metrics (#117)
Updated text distance computation backend (#128)
Simplified repr of NestedObject when they have no children (#137)

Documentation

Cleaned README prerequisites & URLs (#125)
Added usage example for images in README (#125)
Updated installation instructions in README (#154)
Added docstring examples to FUNSD (#154)
Added docstring examples to evaluation metrics (#154)

Others

Updated environment collection script & bug report template (#135)
Enabled GPU on analyze.py script (#141)

v0.1.0

3 years ago

This first release adds pretrained models for end-to-end OCR and document manipulation utilities.

Release handled by @fg-mindee & @charlesmindee

Note: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.

Highlights

Easy & high-performing document reading

Since document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.

PDF reading is a wrapper around PyMuPDF back-end for fast file reading

from doctr.documents import read_pdf
# from path
doc = read_pdf("path/to/your/doc.pdf")
# from stream
with open("path/to/your/doc.pdf", 'rb') as f:
    doc = read_pdf(f.read())

while image reading is using OpenCV backend

from doctr.documents import read_img
page = read_img("path/to/your/img.jpg")

Pretrained End-to-End OCR predictors

Whether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features

Text detection

Currently, only DBNet-based architectures are supported, more to come in the next releases!

from doctr.documents import read_pdf
from doctr.models import db_resnet50_predictor
model = db_resnet50_predictor(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model(doc)

Text recognition

There are two architectures implemented for recognition: CRNN, and SAR

from doctr.models import crnn_vgg16_bn_predictor
model = crnn_vgg16_bn_predictor(pretrained=True)

End-to-End OCR

Simply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document

from doctr.documents import read_pdf
from doctr.models import ocr_db_crnn

model = ocr_db_crnn(pretrained=True)
doc = read_pdf("path/to/your/doc.pdf")
result = model([doc])

New features

Documents

Documentation reading and manipulation

Added PDF (#8, #18, #25, #83) and image (#30, #79) reading utilities
Added document structured elements for export (#16, #26, #61, #102)

Models

Deep learning model building and inference

Added model export methods (#10)
Added preprocessing module (#20, #25, #36, #50, #55, #77)
Added text detection model and post-processing (#24, #32, #36, #43, #49, #51, #84): DBNet
Added image cropping function (#33, #44)
Added model param loading function (#49, #60)
Added text recognition post-processing (#35, #36, #37, #38, #43, #45, #49, #51, #63, #65, #74, #78, #84, #101, #107, #108, #111, #112): SAR & CRNN
Added task-specific predictors (#39, #52, #58, #62, #85, #98, #102)
Added VGG16 (#36), Resnet31 (#70) backbones

Utils

Utility features relevant to the library use cases.

Added page interactive prediction visualization (#54, #82)
Added custom types (#87)
Added abstract auto-repr object (#102)
Added metric module (#110)

Test

Verifications of the package well-being before release

Added pytest unittests (#7, #59, #75, #76, #80, #92, #104)

Documentation

Online resources for potential users

Updated README (#9, #48, #67, #68, #95)
Added CONTRIBUTING (#7, #29, #48, #67)
Added sphinx built documentation (#12, #36, #55, #86, #90, #91, #93, #96, #99, #106)

Others

Other tools and implementations

Added python package setup (#7, #21, #67)
Added CI verifications (#7, #67, #69, #73)
Added dockerized environment with library installed (#17, #19)
Added issue template (#34)
Added environment collection script (#81)
Added analysis script (#85, #95, #103)

v0.1-models

3 years ago

This release is only a mirror for pretrained detection & recognition models.