Mindee Doctr Versions Save

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

v0.8.1

2 months ago

Note: doctr 0.8.1 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

Fixed conda receipt and CI jobs for conda and pypi releases
Fixed some broken links
Pre-Release: FAST text detection model from FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation -> Checkpoints will be provided with the next release

v0.8.0

2 months ago

Note: doctr 0.8.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.

What's Changed

Breaking Changes 🛠

db_resnet50_rotation (PyTorch) and linknet_resnet18_rotation (TensorFlow) are removed (All models can handle rotated documents now)
.show(doc) changed to .show()

New features

All models have pretrained checkpoints now by @odulcy-mindee
All detection models was retrained on rotated samples by @odulcy-mindee
Improved orientation detection for documents rotated between -90 and 90 degrees by @felixdittrich92
Conda deployment job & receipt added by @frgfm
Official docTR docker images are added by @odulcy-mindee => docker-images
New benchmarks and documentation improvements by @felixdittrich92
WildReceipt dataset added by @HamzaGbada
EarlyStopping callback added to all training scripts by @SkaarFacee
Hook mechanism added to ocr_predictor to maniplulate the detection predictions in the middle of the pipeline to your needs by @felixdittrich92


from doctr.model import ocr_predictor

class CustomHook:
    def __call__(self, loc_preds):
        # Manipulate the location predictions here
        # 1. The outpout structure needs to be the same as the input location predictions
        # 2. Be aware that the coordinates are relative and needs to be between 0 and 1
        return loc_preds

my_hook = CustomHook()

predictor = ocr_predictor(pretrained=True)
# Add a hook in the middle of the pipeline
predictor.add_hook(my_hook)
# You can also add multiple hooks which will be executed sequentially
for hook in [my_hook, my_hook, my_hook]:
    predictor.add_hook(hook)

What's Changed

Breaking Changes 🛠

[prototype] compute orientation on segmentation map by @felixdittrich92 in https://github.com/mindee/doctr/pull/1336

New Features

feat: :sparkles: Official docker images for docTR by @odulcy-mindee in https://github.com/mindee/doctr/pull/1322
Add wildreceipt dataset by @HamzaGbada in https://github.com/mindee/doctr/pull/1359
Added early stopping feature by @SkaarFacee in https://github.com/mindee/doctr/pull/1397
[PT / TF] Add TextNet - FAST backbone by @felixdittrich92 in https://github.com/mindee/doctr/pull/1425
feat: Adds conda recipe & corresponding CI jobs by @frgfm in https://github.com/mindee/doctr/pull/1414
[prototype] Extend detection result customization by @felixdittrich92 in https://github.com/mindee/doctr/pull/1449

Bug Fixes

[FIX] antialising in PreProcessor by @felixdittrich92 in https://github.com/mindee/doctr/pull/1324
[Fix] prob computation for parseq and vitstr models by @felixdittrich92 in https://github.com/mindee/doctr/pull/1327
[FIX] clip overflowing probs by @felixdittrich92 in https://github.com/mindee/doctr/pull/1335
[Fix] PT - convert BF16 tensor to float before calling .numpy() by @chunyuan-w in https://github.com/mindee/doctr/pull/1342
[Fix] Prob comp in vitstr and parseq for empty words by @felixT2K in https://github.com/mindee/doctr/pull/1345
[Fix] TF - add bf16 numpy dtype conversion by @felixT2K in https://github.com/mindee/doctr/pull/1346
[Fix] fix growing mem usage pytorch crnn by @felixdittrich92 in https://github.com/mindee/doctr/pull/1357
[Fix] tf augmentations by @felixT2K in https://github.com/mindee/doctr/pull/1360
Fix broken weasyprint link by @simonw in https://github.com/mindee/doctr/pull/1367
feat: :sparkles: use tqdm instead of fastprogress in reference scripts by @odulcy-mindee in https://github.com/mindee/doctr/pull/1389
[FIX] Fix mypy errors by @felixdittrich92 in https://github.com/mindee/doctr/pull/1419
[FIX] db loss TF and PT also for training with rotated samples by @felixdittrich92 in https://github.com/mindee/doctr/pull/1396
[FIX] Dice loss computation in both backends by @felixdittrich92 in https://github.com/mindee/doctr/pull/1442
[FIX] Fix streamlit demo by @felixdittrich92 in https://github.com/mindee/doctr/pull/1447
[Fix / Misc] Fix conda CI build and publish job and update actions by @felixdittrich92 in https://github.com/mindee/doctr/pull/1453
[Fix] Catch Divide by zero explicit by @felixdittrich92 in https://github.com/mindee/doctr/pull/1471

Improvements

feat: :sparkles: PT ViTSTR Small Checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1319
feat: :sparkles: PT Parseq Checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1320
[scripts] Add backbone freeze for recognition scripts and update augmentations also for DDP script by @felixdittrich92 in https://github.com/mindee/doctr/pull/1328
[PyTorch] replace no_grad with inference_mode by @felixdittrich92 in https://github.com/mindee/doctr/pull/1323
[transforms] update random apply to work also with targets by @felixdittrich92 in https://github.com/mindee/doctr/pull/1333
[TF] unify detection augmentations by @felixdittrich92 in https://github.com/mindee/doctr/pull/1351
feat: :sparkles: PT SAR Resnet31 Checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1362
feat: :sparkles: PT ViTSTR Base checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1361
TF change antialias to true by @felixT2K in https://github.com/mindee/doctr/pull/1348
feat: :sparkles: PT Linknet Resnet18 Checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1387
[demo] remove limitation and update demo by @felixdittrich92 in https://github.com/mindee/doctr/pull/1390
feat: :sparkles: PT Linknet Resnet50 Checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1391
feat: :sparkles: PT Linknet Resnet 34 Checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1393
[Fixes / docs] Add more vocabs / Fix Style / HF hub / API Dep by @felixdittrich92 in https://github.com/mindee/doctr/pull/1412
fix: :bug: add sqlite dependency by @odulcy-mindee in https://github.com/mindee/doctr/pull/1421
feat: :sparkles: new TF Linknet Resnet checkpoints by @odulcy-mindee in https://github.com/mindee/doctr/pull/1424
feat: :sparkles: PT db_resnet34 checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1433
[references] TF / PT crop & document orientation classifier train scripts by @felixdittrich92 in https://github.com/mindee/doctr/pull/1432
[PT] remove submodule from textnet arch by @felixdittrich92 in https://github.com/mindee/doctr/pull/1436
[references] Add poly scheduler for detection training by @felixdittrich92 in https://github.com/mindee/doctr/pull/1444
[references] Add interval saving for detection trainings by @felixdittrich92 in https://github.com/mindee/doctr/pull/1454
feat: :sparkles: PT db_resnet50 checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1465
read labels in utf-8 and log input string on vocab error by @eikaramba in https://github.com/mindee/doctr/pull/1479
feat: :sparkles: tf db_resnet50 checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1480
feat: :sparkles: TF db mobilenet v3 large new ckpt by @odulcy-mindee in https://github.com/mindee/doctr/pull/1483
[Docs] extend doc with DocumentBuilder options by @felixdittrich92 in https://github.com/mindee/doctr/pull/1486
feat: ✨ TF db mobilenet v3 large new ckpt by @odulcy-mindee in https://github.com/mindee/doctr/pull/1487

Miscellaneous

chore: apply post release modifications v0.7.0 by @felixdittrich92 in https://github.com/mindee/doctr/pull/1309
docs: :pencil2: fix images on pypi by @odulcy-mindee in https://github.com/mindee/doctr/pull/1310
Update Dockerfile (GPU Support, Workdir, Permissions) by @ffalkenberg in https://github.com/mindee/doctr/pull/1313
[misc] rename helper function for bf16 to float32 casting by @felixdittrich92 in https://github.com/mindee/doctr/pull/1347
hebrew letters by @uriva in https://github.com/mindee/doctr/pull/1355
docs: :pencil2: add WILDRECEIPT in docs and fix README.md by @odulcy-mindee in https://github.com/mindee/doctr/pull/1363
[misc] increase to 0.8.0 and temp pin onnx by @felixT2K in https://github.com/mindee/doctr/pull/1365
[Fix] Typo in README.md by @eltociear in https://github.com/mindee/doctr/pull/1374
Relax Pillow and OpenCV version bounds. by @nh2 in https://github.com/mindee/doctr/pull/1373
[misc & build] replace isort pydocstyle and black with ruff by @felixdittrich92 in https://github.com/mindee/doctr/pull/1379
[Misc] rename char classifiation scripts and dependency pin by @felixdittrich92 in https://github.com/mindee/doctr/pull/1469
[Docs] add PyTorch / TensorFlow benchmarks by @felixdittrich92 in https://github.com/mindee/doctr/pull/1321
[misc] rename channel by @felixdittrich92 in https://github.com/mindee/doctr/pull/1488

New Contributors

@ffalkenberg made their first contribution in https://github.com/mindee/doctr/pull/1313
@chunyuan-w made their first contribution in https://github.com/mindee/doctr/pull/1342
@uriva made their first contribution in https://github.com/mindee/doctr/pull/1355
@simonw made their first contribution in https://github.com/mindee/doctr/pull/1367
@nh2 made their first contribution in https://github.com/mindee/doctr/pull/1373
@SkaarFacee made their first contribution in https://github.com/mindee/doctr/pull/1397
@eikaramba made their first contribution in https://github.com/mindee/doctr/pull/1479

Full Changelog: https://github.com/mindee/doctr/compare/v0.7.0...v0.8.0

v0.7.0

8 months ago

Note: doctr 0.7.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0. Note: We will release the missing PyTorch checkpoints with 0.7.1

What's Changed

Breaking Changes 🛠

We changed the preserve_aspect_ratio parameter to True by default in https://github.com/mindee/doctr/pull/1279 => To restore the old behaviour you can pass preserve_aspect_ratio=False to the predictor instance

New features

Feat: Make detection training and inference Multiclass by @aminemindee in https://github.com/mindee/doctr/pull/1097
Now all TensorFlow models have pretrained weights by @odulcy-mindee
The docs was updated and model corresponding benchmarks was added by @felixdittrich92
Two new recognition models was added (ViTSTR and PARSeq) in both frameworks by @felixdittrich92 @nikokks

Add of the KIE predictor

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.

from doctr.io import DocumentFile
from doctr.models import kie_predictor

# Model
model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
# Analyze
result = model(doc)

predictions = result.pages[0].predictions
for class_name in predictions.keys():
    list_predictions = predictions[class_name]
    for prediction in list_predictions:
        print(f"Prediction for {class_name}: {prediction}")

The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

What's Changed

Breaking Changes 🛠

Feat: Make detection training and inference Multiclass by @aminemindee in https://github.com/mindee/doctr/pull/1097

New Features

feat: :sparkles: PyTorch Recognition Model Multi-GPU support by @odulcy-mindee in https://github.com/mindee/doctr/pull/1164
[Feat] Add PARSeq model TF and PT by @nikokks in https://github.com/mindee/doctr/pull/1205
[Feat] Predictor precision PT backend by @felixdittrich92 in https://github.com/mindee/doctr/pull/1204
feat: :sparkles: ClearML support for TensorFlow by @odulcy-mindee in https://github.com/mindee/doctr/pull/1257

Bug Fixes

fix classification model cuda move by @odulcy-mindee in https://github.com/mindee/doctr/pull/1125
fix: :wrench: docker api use GitHub repository by @odulcy-mindee in https://github.com/mindee/doctr/pull/1148
Error in unpacking archive of SROIE dataset by @HamzaGbada in https://github.com/mindee/doctr/pull/1178
[Fix] remove autogen version.py fix docs build and fix version identifier by @felixT2K in https://github.com/mindee/doctr/pull/1180
[FIX] Error in unpacking archive of CORD dataset by @HamzaGbada in https://github.com/mindee/doctr/pull/1190
chore(deps-dev): update docutils requirement from <0.20 to <0.21 by @dependabot in https://github.com/mindee/doctr/pull/1198
speed up VIT models and fix patch size by @felixdittrich92 in https://github.com/mindee/doctr/pull/1219
[Fix] PARSeq pytorch fixes by @felixdittrich92 in https://github.com/mindee/doctr/pull/1227
[Fix] PARSeq tensorflow fixes by @felixdittrich92 in https://github.com/mindee/doctr/pull/1228
[fix/chore] fix bug in tf det eval script / update dep version specifier by @felixdittrich92 in https://github.com/mindee/doctr/pull/1232
fix: :bug: fix bug when training object detection by @aminemindee in https://github.com/mindee/doctr/pull/1254
[Fix] fix obj det train and suppress endless warning prints by @felixdittrich92 in https://github.com/mindee/doctr/pull/1267
[Fix] add ignore keys if classes differ - KIE training by @felixdittrich92 in https://github.com/mindee/doctr/pull/1271
change the way model is saved in ddp by @venkatapathy in https://github.com/mindee/doctr/pull/1289

Improvements

Improve pypdfium2 integration again by @mara004 in https://github.com/mindee/doctr/pull/1096
[build] replaces flake8 with ruff by @felixT2K in https://github.com/mindee/doctr/pull/1179
[datasets] Add IIIT HWS dataset by @felixT2K in https://github.com/mindee/doctr/pull/1199
feat: :sparkles: TF linknet_resnet18 checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1231
[tests/bug] improve tests and fix a minor bug by @felixdittrich92 in https://github.com/mindee/doctr/pull/1229
[PyTorch] update transforms pytorch (classification / det / rec) by @felixdittrich92 in https://github.com/mindee/doctr/pull/1253
[docs] custom model load by @felixdittrich92 in https://github.com/mindee/doctr/pull/1263
feat: :sparkles: TF ViTSTR Small checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1273
[predictor] aspect ratio true by default by @felixdittrich92 in https://github.com/mindee/doctr/pull/1279
feat: :sparkles: TF SAR Resnet31 checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1281

Miscellaneous

chore: apply post release modifications v0.6.0 by @felixdittrich92 in https://github.com/mindee/doctr/pull/1081
chore: dev version downgrade from 0.7.0 to 0.6.1 by @felixdittrich92 in https://github.com/mindee/doctr/pull/1082
chore(deps-dev): update black requirement from <23.0,>=22.1 to >=22.1,<24.0 by @dependabot in https://github.com/mindee/doctr/pull/1140
chore(deps-dev): update docutils requirement from <0.18 to <0.20 by @dependabot in https://github.com/mindee/doctr/pull/1101
docs: Minor typo fix by @khanfarhan10 in https://github.com/mindee/doctr/pull/1150
Update utils.py by @weiwangmeta in https://github.com/mindee/doctr/pull/1177
[tests/TF/build] enable missing classification onnx tests and set tensorflow lower bound to 2.11 by @felixT2K in https://github.com/mindee/doctr/pull/1182
[build] update pytorch dependency by @felixT2K in https://github.com/mindee/doctr/pull/1188
[build] drop py3.6/3.7 support and update CI default to py3.8/3.9 by @felixT2K in https://github.com/mindee/doctr/pull/1184
[CI] change old cache action and skip TF classification onnx export temporarily by @felixT2K in https://github.com/mindee/doctr/pull/1201
[Fix] add missing mean/std defaults, add missing weight init for sar by @felixT2K in https://github.com/mindee/doctr/pull/1212
[classification] vit and magc_resnet checkpoints by @felixdittrich92 in https://github.com/mindee/doctr/pull/1221
[tests] update test cases by @felixT2K in https://github.com/mindee/doctr/pull/1233
chore: apply PIL major changes and increase min version specifier by @felixT2K in https://github.com/mindee/doctr/pull/1237
[chore]: Pypdfium2 compatibility fix by @felixT2K in https://github.com/mindee/doctr/pull/1239
[chore]: Replace tensorflow_addons by @felixdittrich92 in https://github.com/mindee/doctr/pull/1252
[style] Fix markdown style warnings by @felixdittrich92 in https://github.com/mindee/doctr/pull/1260
[docs] update export page to ONNX by @felixdittrich92 in https://github.com/mindee/doctr/pull/1261
[PyPi] Fix image display by @felixdittrich92 in https://github.com/mindee/doctr/pull/1268
[chore] increase version and update maintainers by @felixT2K in https://github.com/mindee/doctr/pull/1264
[demo] update models list for Tf / PT backend by @felixdittrich92 in https://github.com/mindee/doctr/pull/1280
[chore] update to new torchvision API in models as well by @felixT2K in https://github.com/mindee/doctr/pull/1291
[chore]: clean dependencies by @felixT2K in https://github.com/mindee/doctr/pull/1287
feat: :sparkles: TF Parseq checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1305
feat: :sparkles: TF ViTSTR Base checkpoint by @odulcy-mindee in https://github.com/mindee/doctr/pull/1306
[docs] update benchmark page by @felixdittrich92 in https://github.com/mindee/doctr/pull/1234

New Contributors

@dependabot made their first contribution in https://github.com/mindee/doctr/pull/1140
@eltociear made their first contribution in https://github.com/mindee/doctr/pull/1119
@khanfarhan10 made their first contribution in https://github.com/mindee/doctr/pull/1150
@weiwangmeta made their first contribution in https://github.com/mindee/doctr/pull/1177
@HamzaGbada made their first contribution in https://github.com/mindee/doctr/pull/1178
@felixT2K made their first contribution in https://github.com/mindee/doctr/pull/1180
@nikokks made their first contribution in https://github.com/mindee/doctr/pull/1205
@odulcy made their first contribution in https://github.com/mindee/doctr/pull/1246
@venkatapathy made their first contribution in https://github.com/mindee/doctr/pull/1289

Full Changelog: https://github.com/mindee/doctr/compare/v0.6.0...v0.7.0

v0.6.0

1 year ago

Highlights of the release:

Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

Full integration with Huggingface Hub (docTR meets Huggingface)

Loading from hub:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)

Pushing to the hub:

from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')

Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

Predefined datasets can be used also for recognition task

from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]

Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

New models (both frameworks)

classification: VisionTransformer (ViT)
recognition: Vision Transformer for Scene Text Recognition (ViTSTR)

Bug fixes recognition models

MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)

ONNX support (experimential)

All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)

NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

Further features

our demo is now also PyTorch compatible, thanks to @odulcy-mindee
it is now possible to detect the language of the extracted text, thanks to @aminemindee

What's Changed

Breaking Changes 🛠

feat: :sparkles: allow beam width > 1 in the CRNN postprocessor by @khalidMindee in https://github.com/mindee/doctr/pull/630
[Fix] TensorFlow SAR_Resnet31 implementation by @felixdittrich92 in https://github.com/mindee/doctr/pull/925

New Features

[onnx] classification models export by @felixdittrich92 in https://github.com/mindee/doctr/pull/830
feat: Added Vietnamese entry in VOCAB by @calibretaliation in https://github.com/mindee/doctr/pull/878
feat: Added Czech to the set of vocabularies in datasets/vocabs.py by @Xargonus in https://github.com/mindee/doctr/pull/885
feat: Add ability to upload PT/TF models to Huggingface Hub by @felixdittrich92 in https://github.com/mindee/doctr/pull/881
[feature][tf/pt] integrate from_hub for all tasks by @felixdittrich92 in https://github.com/mindee/doctr/pull/892
[feature] Part 2 from use datasets for recognition by @felixdittrich92 in https://github.com/mindee/doctr/pull/891
[datasets] Add MJSynth (Synth90K) by @felixdittrich92 in https://github.com/mindee/doctr/pull/827
[docu]: add documentation for datasets by @felixdittrich92 in https://github.com/mindee/doctr/pull/905
add a Slack Community badge by @fharper in https://github.com/mindee/doctr/pull/936
Feat/add language detection by @aminemindee in https://github.com/mindee/doctr/pull/1023
add ViT as classification model TF and PT by @felixdittrich92 in https://github.com/mindee/doctr/pull/1050
[models] add ViTSTR TF and PT and update ViT to work as backbone by @felixdittrich92 in https://github.com/mindee/doctr/pull/1055

Bug Fixes

[PyTorch][references] fix pretrained with different vocabs by @felixdittrich92 in https://github.com/mindee/doctr/pull/874
[classification] Fix cfgs by @felixdittrich92 in https://github.com/mindee/doctr/pull/883
docs: Fixed typo in installation instructions by @frgfm in https://github.com/mindee/doctr/pull/901
[Fix] imgur5k test by @felixdittrich92 in https://github.com/mindee/doctr/pull/903
fix: Fixed load_pretrained_params in PyTorch when ignoring keys by @frgfm in https://github.com/mindee/doctr/pull/902
[Fix]: Documentation add missing in vocabs and correct tab in sharing models by @felixdittrich92 in https://github.com/mindee/doctr/pull/904
Fix links in readme by @jsn5 in https://github.com/mindee/doctr/pull/937
[Fix] PyTorch MASTER implementation by @felixdittrich92 in https://github.com/mindee/doctr/pull/941
[Fix] MJSynth dataset: filter corrupted or missing images by @felixdittrich92 in https://github.com/mindee/doctr/pull/956
[Fix] SVT dataset: clip box values and add shape and label check by @felixdittrich92 in https://github.com/mindee/doctr/pull/955
[Fix] Tensorflow MASTER implementation by @felixdittrich92 in https://github.com/mindee/doctr/pull/949
[FIX] MASTER AMP and onnxruntime issue with master PT by @felixdittrich92 in https://github.com/mindee/doctr/pull/986
pytest-api test: fix ping server step by @odulcy-mindee in https://github.com/mindee/doctr/pull/997
docs/index: fix two minor typos by @mara004 in https://github.com/mindee/doctr/pull/1002
Fix orientation details export by @aminemindee in https://github.com/mindee/doctr/pull/1022
Changed return type of multithread_exec to iterator by @mtvch in https://github.com/mindee/doctr/pull/1019
[datasets] Fix recognition parts of SynthText and IMGUR5K by @felixdittrich92 in https://github.com/mindee/doctr/pull/1038
[Fix] rotation classifier input move to model device by @felixdittrich92 in https://github.com/mindee/doctr/pull/1039
[models] Vit: fix intermediate size scale and unify TF to PT by @felixdittrich92 in https://github.com/mindee/doctr/pull/1063

Improvements

chore: Applied post release modifications v0.5.1 by @felixdittrich92 in https://github.com/mindee/doctr/pull/870
[refactor][fix]: Part1 from use datasets for recognition task by @felixdittrich92 in https://github.com/mindee/doctr/pull/889
ci: Add swagger ping in API CI job by @frgfm in https://github.com/mindee/doctr/pull/906
[docs] Add naming conventions for upload models to hf hub by @felixdittrich92 in https://github.com/mindee/doctr/pull/921
docs: Improved error message of encode_string by @frgfm in https://github.com/mindee/doctr/pull/929
[Refactor] PyTorch SAR_Resnet31 make it ONNX exportable (again) by @felixdittrich92 in https://github.com/mindee/doctr/pull/930
Add support page in README by @jonathanMindee in https://github.com/mindee/doctr/pull/946
[references] Add eval recognition and update eval detection scripts by @felixdittrich92 in https://github.com/mindee/doctr/pull/933
update pypdfium2 dep and improve code quality by @felixdittrich92 in https://github.com/mindee/doctr/pull/953
docs: Moved need help section after code snippet by @frgfm in https://github.com/mindee/doctr/pull/959
chore: Updated TF requirements to fix grouped convolutions on CPU by @frgfm in https://github.com/mindee/doctr/pull/963
style: Fixed mypy and moved tool configs to pyproject.toml by @frgfm in https://github.com/mindee/doctr/pull/966
Updating the readme by @Atomme1 in https://github.com/mindee/doctr/pull/938
Update docs in using_doctr by @odulcy-mindee in https://github.com/mindee/doctr/pull/993
feat: add a basic example of text detection by @ianardee in https://github.com/mindee/doctr/pull/999
Add pytorch demo by @odulcy-mindee in https://github.com/mindee/doctr/pull/1008
[build] move requirements to pyproject.toml by @felixdittrich92 in https://github.com/mindee/doctr/pull/1031
Migrate static data from github to monitoring middleware. by @marvinmindee in https://github.com/mindee/doctr/pull/1033
Changes needed to be able to use doctr on AWS Lambda by @mtvch in https://github.com/mindee/doctr/pull/1017
[Fix] unify recognition dataset parts return signature by @felixdittrich92 in https://github.com/mindee/doctr/pull/1041
Updated README.md for custom fonts by @carl-krikorian in https://github.com/mindee/doctr/pull/1051
[refactor] detection script by @felixdittrich92 in https://github.com/mindee/doctr/pull/1060
[models] ViT add checkpoints and some rework to use pretrained ViT backbone in ViTSTR by @felixdittrich92 in https://github.com/mindee/doctr/pull/1072
upgrade pypdfium2 by @felixdittrich92 in https://github.com/mindee/doctr/pull/1075
ViTSTR disable pretrained backbone by default by @felixdittrich92 in https://github.com/mindee/doctr/pull/1080

Miscellaneous

[Refactor] commit tags by @felixdittrich92 in https://github.com/mindee/doctr/pull/871
Update io/pdf.py to new pypdfium2 API by @mara004 in https://github.com/mindee/doctr/pull/944
docs: Documentation the reason for keras version specifier by @frgfm in https://github.com/mindee/doctr/pull/958
[datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in https://github.com/mindee/doctr/pull/983
[datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in https://github.com/mindee/doctr/pull/987
fix: update tensorflow-addons to match tensorflow version by @ianardee in https://github.com/mindee/doctr/pull/998
move transformers implementation to modules by @felixdittrich92 in https://github.com/mindee/doctr/pull/1013
[FIX] revert dev deps mistake by @felixdittrich92 in https://github.com/mindee/doctr/pull/1047
[models] update vit and transformer layer norm by @felixdittrich92 in https://github.com/mindee/doctr/pull/1059
make pretrained backbone flexible in predictor by @felixdittrich92 in https://github.com/mindee/doctr/pull/1061
handle LocalizationConfusion memory consuption and upgrade min weasyprint version by @felixdittrich92 in https://github.com/mindee/doctr/pull/1062
Fixed small typo in references recognition by @carl-krikorian in https://github.com/mindee/doctr/pull/1070
[docs] install extras for MacBooks with M1 chip by @felixdittrich92 in https://github.com/mindee/doctr/pull/1076
update version for minor release by @felixdittrich92 in https://github.com/mindee/doctr/pull/1073

New Contributors

@calibretaliation made their first contribution in https://github.com/mindee/doctr/pull/878
@Xargonus made their first contribution in https://github.com/mindee/doctr/pull/885
@khalidMindee made their first contribution in https://github.com/mindee/doctr/pull/630
@frgfm made their first contribution in https://github.com/mindee/doctr/pull/901
@jsn5 made their first contribution in https://github.com/mindee/doctr/pull/937
@fharper made their first contribution in https://github.com/mindee/doctr/pull/936
@jonathanMindee made their first contribution in https://github.com/mindee/doctr/pull/946
@Atomme1 made their first contribution in https://github.com/mindee/doctr/pull/938
@odulcy-mindee made their first contribution in https://github.com/mindee/doctr/pull/993
@ianardee made their first contribution in https://github.com/mindee/doctr/pull/998
@aminemindee made their first contribution in https://github.com/mindee/doctr/pull/1022
@mtvch made their first contribution in https://github.com/mindee/doctr/pull/1019
@marvinmindee made their first contribution in https://github.com/mindee/doctr/pull/1033
@carl-krikorian made their first contribution in https://github.com/mindee/doctr/pull/1051

Full Changelog: https://github.com/mindee/doctr/compare/v0.5.1...v0.6.0

v0.5.1

2 years ago

This minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !

Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improvement of the documentation

The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed. This how it renders:

doc Capture d’écran de 2022-03-22 11-08-31

Rotated text detection extended to Tensorflow backend

We provide weights for the linknet_resnet18_rotation model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images. All these improvements led to much better results, and the pretrained model is now very robust.

Preserving the aspect ratio in the detection task

You can now choose to preserve the aspect ratio in the detection_predictor:

>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

This option can also be activated in the high level end-to-end predictor:

>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)

Integration within the HugginFace Hub

The artefact detection model is now available on the HugginFace Hub, this is amazing:

Capture d’écran de 2022-03-22 11-33-14

On DocTR, you can now use the .from_hub() method so that those 2 snippets are equivalent:

# Pretrained
from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)

and:

# HF Hub
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")

Breaking changes

Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible

We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.

Full changelog

What's Changed

Breaking Changes 🛠

fix: polygon orientation + line aggregation by @charlesmindee in https://github.com/mindee/doctr/pull/801
refactor: Switched from PyMuPDF to pypdfium2 by @fg-mindee in https://github.com/mindee/doctr/pull/829

New Features

feat: Added RandomHorizontalFLip in TF by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/779
Imgur5k dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/785
feat: Added support of GPU for predictors in PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/808
Add SynthWordGenerator to text reco training scripts by @felixdittrich92 in https://github.com/mindee/doctr/pull/825
fix: Fixed some ResNet architecture imprecisions by @fg-mindee in https://github.com/mindee/doctr/pull/828
feat: Added shadow augmentation for all backends by @fg-mindee in https://github.com/mindee/doctr/pull/811
feat: Added loading method for PyTorch artefact detection models from HF Hub by @fg-mindee in https://github.com/mindee/doctr/pull/836
feat: add rotated linknet_resnet18 tensorflow ckpts by @charlesmindee in https://github.com/mindee/doctr/pull/817

Bug Fixes

fix: Fixed rotation of img + target by @fg-mindee in https://github.com/mindee/doctr/pull/784
fix: show sample when batch size is 1 by @charlesmindee in https://github.com/mindee/doctr/pull/787
ci: Fixed PR label check job by @fg-mindee in https://github.com/mindee/doctr/pull/792
ci: Fixed typo in the script ref by @fg-mindee in https://github.com/mindee/doctr/pull/794
[datasets] fix description by @felixdittrich92 in https://github.com/mindee/doctr/pull/795
fix: linknet target computation by @charlesmindee in https://github.com/mindee/doctr/pull/803
ci: Fixed issue templates by @fg-mindee in https://github.com/mindee/doctr/pull/806
fix: Reverted mistake in demo by @fg-mindee in https://github.com/mindee/doctr/pull/810
Restore remap boxes by @Rob192 in https://github.com/mindee/doctr/pull/812
fix: Fixed SAR model for training and inference in PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/831
fix: Fixed expand_line for horizontal & vertical cases by @fg-mindee in https://github.com/mindee/doctr/pull/842
fix: Fixes inplace target modifications for AbstractDatasets by @fg-mindee in https://github.com/mindee/doctr/pull/848
fix: Fixed landing page and title underlines by @fg-mindee in https://github.com/mindee/doctr/pull/860
docs: Fixed HTML title by @fg-mindee in https://github.com/mindee/doctr/pull/864

Improvements

docs: Updated headers of python files by @fg-mindee in https://github.com/mindee/doctr/pull/781
[datasets] unify np_dtype and fix comments by @felixdittrich92 in https://github.com/mindee/doctr/pull/782
fix: Clip in rotation transform + eval_straight mode for training by @charlesmindee in https://github.com/mindee/doctr/pull/786
refactor: Avoids instantiating orientation predictor when unnecessary by @fg-mindee in https://github.com/mindee/doctr/pull/809
feat: add straight-eval arg in evaluate script by @charlesmindee in https://github.com/mindee/doctr/pull/793
feat: add dice loss in linknet by @charlesmindee in https://github.com/mindee/doctr/pull/816
feat: add shrinked target in linknet + dilation in postprocessing by @charlesmindee in https://github.com/mindee/doctr/pull/822
feat: replace bce by focal loss in linknet loss by @charlesmindee in https://github.com/mindee/doctr/pull/824
docs: add rotation in docs by @charlesmindee in https://github.com/mindee/doctr/pull/846
feat: add aspect ratio for ocr predictor by @charlesmindee in https://github.com/mindee/doctr/pull/835
feat: add target to resize transform for aspect ratio training (detection task) by @charlesmindee in https://github.com/mindee/doctr/pull/823
update bug report ticket with Active backend field by @felixdittrich92 in https://github.com/mindee/doctr/pull/853
Theme + css #1 by @felixdittrich92 in https://github.com/mindee/doctr/pull/856
docs: Adds illustration in the docstrings of doctr.datasets by @felixdittrich92 in https://github.com/mindee/doctr/pull/857
docs: Updated docstrings of io, transforms & utils by @felixdittrich92 in https://github.com/mindee/doctr/pull/859
docs: Updated folder hierarchy of doc source and nootbooks to rst file by @felixdittrich92 in https://github.com/mindee/doctr/pull/862
Doc models #5 by @felixdittrich92 in https://github.com/mindee/doctr/pull/861
fix: linknet hyperparameters postprocessing + demo for rotation model by @charlesmindee in https://github.com/mindee/doctr/pull/865

Miscellaneous

chore: Applied post release modifications by @fg-mindee in https://github.com/mindee/doctr/pull/780
Switch to new pypdfium2 API by @mara004 in https://github.com/mindee/doctr/pull/845

New Contributors

@mara004 made their first contribution in https://github.com/mindee/doctr/pull/845

Full Changelog: https://github.com/mindee/doctr/compare/v0.5.0...v0.5.1

v0.5.0

2 years ago

This release adds support of rotated documents, and extends both the model & dataset zoos.

Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

:upside_down_face: :smiley: Rotation-aware text detection :upside_down_face: :smiley:

It's no secret: this release focus was to bring the same level of performance to rotated documents!

predictions

docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:

Straightening pages before text detection

Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part :pray:

This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness.

Text detection training with rotated images

doctr_sample

The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.

Crop orientation resolution

rot2

Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!

:zebra: A wider pretrained classification model zoo :zebra:

The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated :rocket: Those were trained using our synthetic character classification dataset, for more details cf. Character classification training

:framed_picture: New public datasets join the fray

Thanks to @felixdittrich92, the list of supported datasets has considerably grown :partying_face: This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here :point_right: #587

Synthetic text recognition dataset

Additionally, we followed up on the existing CharGenerator by introducing WordGenerator:

generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
you can even pass a list of fonts so that each word font family is randomly picked among them

Below are some samples using a font_size=32: wordgenerator_sample

:bookmark_tabs: New notebooks

Two new notebooks have made their way into the documentation:

producing searchable PDFs from docTR analysis results
introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR

Breaking changes

Revamp of classification models

With the retraining of all classification backbones, several changes have been introduced:

Model naming: linknet16 --> linknet_resnet18
Architecture changes: all classification backbones are available with a classification head now.

Enforcing relative coordinates in datasets

In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!

0.4.1	0.5.0
`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True)` `>>> img, target = ds[0]` `>>> print(target['boxes'].dtype, target['boxes'].max())` `(dtype('int64'), 862)`	`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True)` `>>> img, target = ds[0]` `>>> print(target['boxes'].dtype, target['boxes'].max())` `(dtype('float32'), 0.98341835)`

Full changelog

Breaking Changes 🛠

refacto: :wrench: postprocessing with rotated boxes by @charlesmindee in https://github.com/mindee/doctr/pull/641
refactor: Refactored LinkNet by @fg-mindee in https://github.com/mindee/doctr/pull/733
refactor: Renamed DataLoader arg "workers" into "num_workers" by @fg-mindee in https://github.com/mindee/doctr/pull/737
refactor: Unified return_preds flags across all tasks by @fg-mindee in https://github.com/mindee/doctr/pull/741
refactor: Introduces img + target transforms in Datasets by @fg-mindee in https://github.com/mindee/doctr/pull/750
refactor: refactoring rotated boxes by @charlesmindee in https://github.com/mindee/doctr/pull/731
refactor: Enforced relative coordinates for all dataset geometries by @fg-mindee in https://github.com/mindee/doctr/pull/775

New Features

SynthText dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/624
[notebooks] add export_as_pdfa notebook by @felixdittrich92 in https://github.com/mindee/doctr/pull/650
ICDAR2003 dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/653
feat: Implements erosion & dilation in PyTorch & TF by @fg-mindee in https://github.com/mindee/doctr/pull/669
Rotate page by @Rob192 in https://github.com/mindee/doctr/pull/488
feat: Added option to use AMP with TF scripts by @fg-mindee in https://github.com/mindee/doctr/pull/682
feat: Added support of FasterRCNN for PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/691
ICDAR2013 dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/662
feat: Added LR finder option in PyTorch training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/703
feat: Added line reading for source PDFs by @fg-mindee in https://github.com/mindee/doctr/pull/707
feat: Added plot_samples support to visualize the images along with the targets by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/704
SVHN dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/634
feat: Added checkpoint for obj_detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/713
feat: add classification module for crop orientation by @charlesmindee in https://github.com/mindee/doctr/pull/721
feat: Added inference+post processing script for artefact detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/728
feat: Added latency evaluation scripts for all tasks by @fg-mindee in https://github.com/mindee/doctr/pull/746
docs: Added colab link in the Read me for artefact detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/755
feat: Added LR Finder for TensorFlow scripts by @fg-mindee in https://github.com/mindee/doctr/pull/747
feat: Added latency evaluation & benchmark for image classification by @fg-mindee in https://github.com/mindee/doctr/pull/757
feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/758
feat: Added WordGenerator dataset by @fg-mindee in https://github.com/mindee/doctr/pull/760
feat: Added dedicated evaluation scripts for text detection by @fg-mindee in https://github.com/mindee/doctr/pull/761
feat: Refactored & retrained all classification models by @fg-mindee in https://github.com/mindee/doctr/pull/763
feat: add rotated ckpts for pytorch DBNet + fix line resolution for rotated pages by @charlesmindee in https://github.com/mindee/doctr/pull/743
feat: Added torchvision photometric augmentations in artefact detection training by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/764
feat: Added random noise augmentation to object detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/654
feat: add rotation option to both detection training scripts by @charlesmindee in https://github.com/mindee/doctr/pull/765
feat: Added ChannelShuffle transformation and fixes RandomCrop by @fg-mindee in https://github.com/mindee/doctr/pull/768
feat: Added Gaussian Noise implementation in Tensorflow by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/771
feat: Added Random Horizontal Flip augmentation by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/773
ci: Added release helper actions by @fg-mindee in https://github.com/mindee/doctr/pull/776

Bug Fixes

docs: Fixed documentation build by @fg-mindee in https://github.com/mindee/doctr/pull/644
fix: :bug: bug canvas dtype for threshold target by @charlesmindee in https://github.com/mindee/doctr/pull/645
fix: :bug: assume_straight_pages in predictor by @charlesmindee in https://github.com/mindee/doctr/pull/647
ci: Fixed silent isort failure by @fg-mindee in https://github.com/mindee/doctr/pull/655
fix: Fixed W&B config log by @fg-mindee in https://github.com/mindee/doctr/pull/656
fix: Updates Makefile to match CI by @fg-mindee in https://github.com/mindee/doctr/pull/661
docs: Fixed typo in the docstrings of metrics by @fg-mindee in https://github.com/mindee/doctr/pull/664
fix: rotation arg in training scripts by @charlesmindee in https://github.com/mindee/doctr/pull/657
feat: Added missing output classes param in DBNet by @fg-mindee in https://github.com/mindee/doctr/pull/666
fix: Fixed LinkNet target & loss computation by @fg-mindee in https://github.com/mindee/doctr/pull/670
fix: box angle rectification according to the quadrant by @charlesmindee in https://github.com/mindee/doctr/pull/667
fix: rotate_boxes angle by @charlesmindee in https://github.com/mindee/doctr/pull/678
fix: Fixed param override of backbone by @fg-mindee in https://github.com/mindee/doctr/pull/689
fix: Added missing AMP flags in training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/690
fix: Added a 0-sized crop safeguard in split_crops by @fg-mindee in https://github.com/mindee/doctr/pull/693
fix: Fixed MASTER recognition architecture by @fg-mindee in https://github.com/mindee/doctr/pull/687
fix: Added safeguard for extreme aspect ratio in Resize by @fg-mindee in https://github.com/mindee/doctr/pull/695
fix: Fixed W&B logger in object detection training script by @fg-mindee in https://github.com/mindee/doctr/pull/697
fix: Fixed geometry utils for polygon <--> rbox conversions by @fg-mindee in https://github.com/mindee/doctr/pull/700
fix: Fixed build_target for detection models with rotated targets by @fg-mindee in https://github.com/mindee/doctr/pull/698
fix: box computing when assume straight pages is false by @charlesmindee in https://github.com/mindee/doctr/pull/720
test: Fixed TF loss unittest by @fg-mindee in https://github.com/mindee/doctr/pull/725
fix: Fixed edge cases of DB loss in PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/726
fix: Fixed computation of Mean IoU by @fg-mindee in https://github.com/mindee/doctr/pull/734
fix: Fixed detection training script by @fg-mindee in https://github.com/mindee/doctr/pull/742
fix: Fixed the bin_thresh of LinkNet by @fg-mindee in https://github.com/mindee/doctr/pull/745
test: Increased flexibility of loss test by @fg-mindee in https://github.com/mindee/doctr/pull/744
fix: Fixed mask computation of DBNet by @fg-mindee in https://github.com/mindee/doctr/pull/753
test: Fixed TensorFlow predictor unittest by @fg-mindee in https://github.com/mindee/doctr/pull/767
fix: Fixed the box cropping from RandomCrop by @fg-mindee in https://github.com/mindee/doctr/pull/772
ci: Fixed CI training job for TF by @fg-mindee in https://github.com/mindee/doctr/pull/770
docs: Fixed README link & update documentation by @fg-mindee in https://github.com/mindee/doctr/pull/774
fix: target DB by @charlesmindee in https://github.com/mindee/doctr/pull/777

Improvements

style: Fixed isort and typing checks by @fg-mindee in https://github.com/mindee/doctr/pull/643
docs: Added TFJS demo ref in README by @fg-mindee in https://github.com/mindee/doctr/pull/651
fix: Added automatic worker resolution to remaining training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/649
feat: Added rbox_iou function with a memory-savy option by @fg-mindee in https://github.com/mindee/doctr/pull/659
style: Cleaned codebase with Codacy hints by @fg-mindee in https://github.com/mindee/doctr/pull/665
feat: Added file existence check in DetectionDataset by @fg-mindee in https://github.com/mindee/doctr/pull/672
fix: pymupdf version by @charlesmindee in https://github.com/mindee/doctr/pull/673
[refactor] SROIE dataset by @felixdittrich92 in https://github.com/mindee/doctr/pull/660
fix: target_ar split crops by @charlesmindee in https://github.com/mindee/doctr/pull/681
feat: add line resolution for rotated boxes by @charlesmindee in https://github.com/mindee/doctr/pull/677
feat: add rboxes rectification in Linknet postprocessing by @charlesmindee in https://github.com/mindee/doctr/pull/679
docs: Added minimal docstring sanity check by @fg-mindee in https://github.com/mindee/doctr/pull/686
fix: Fixed deprecation warnings from numpy & PyMuPDF by @fg-mindee in https://github.com/mindee/doctr/pull/692
refactor: Removed postprocessor from high-level init by @fg-mindee in https://github.com/mindee/doctr/pull/688
feat: Added possibility to change the cache dir of datasets by @fg-mindee in https://github.com/mindee/doctr/pull/694
Mock Sroie / Funsd / Cord / Synthtext / DocArtefacts / IIIT5K / SVT / IC03 (all ^^) by @felixdittrich92 in https://github.com/mindee/doctr/pull/722
refactor: Refactored detection post-processing by @fg-mindee in https://github.com/mindee/doctr/pull/724
ci: Fixed CI job name and ignored .idea files by @fg-mindee in https://github.com/mindee/doctr/pull/727
feat: integration of the classifier in the ocr predictor by @charlesmindee in https://github.com/mindee/doctr/pull/723
test: Switch to a fully mocked PDF for unittests by @fg-mindee in https://github.com/mindee/doctr/pull/735
test: Silenced PyMuPDF warnings by @fg-mindee in https://github.com/mindee/doctr/pull/740
refactor: Removed contiguous param since it's included in torch>=1.7 by @fg-mindee in https://github.com/mindee/doctr/pull/756
feat: add preserve aspect ratio to predictor and vizualisation utils by @charlesmindee in https://github.com/mindee/doctr/pull/766
ci: Optimized CI jobs to speed up development process by @fg-mindee in https://github.com/mindee/doctr/pull/759
feat: Updated timing to more accurate one by @fg-mindee in https://github.com/mindee/doctr/pull/769

Miscellaneous

chore: Applied post release modifications by @fg-mindee in https://github.com/mindee/doctr/pull/642

Full Changelog: https://github.com/mindee/doctr/compare/v0.4.1...v0.5.0

v0.4.1

2 years ago

This patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.

Note: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Automatic Mixed Precision (AMP) :zap:

Training scripts with PyTorch back-end now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!

Artefact detection :flying_saucer:

Document understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.

Here are some early results:

2x3_art(1)

This release comes with a training & validation set DocArtefacts, and a reference training script. Keep an eye for models we will be releasing in the next release!

Get more of docTR with Colab tutorials :book:

You've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of jupyter notebooks that you can open and run locally or on Google Colab for instance!

Check the new page in the documentation to have an updated list of all our community notebooks: https://mindee.github.io/doctr/latest/notebooks.html

Breaking changes

Deprecated support of FP16 for datasets

Float-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type float32 has a lower resolution counterpart float16 which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.

However, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:

0.4.0	0.4.1
`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True, fp16=True)` `>>> print(getattr(ds, "fp16"))` `True`	`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD(train=True, download=True)` `>>> print(getattr(ds, "fp16"))` `None`

Detailed changes

New features

Adds Arabic to supported vocabs in #514 (@mzeidhassan)
Adds XML export method to DocumentBuilder in #544 (@felixdittrich92)
Adds flags to control the behaviour with rotated elements in #551 (@charlesmindee)
Adds unittest to ensure headers are correct in #556 (@fg-mindee)
Adds isort ordering & dedicated CI check in #557 (@fg-mindee)
Adds IIIT-5K to supported datasets in #589 (@felixdittrich92)
Adds support of AMP to all PyTorch training scripts in #604 (@fg-mindee)
Adds DocArtefacts dataset for object detection on non-textual elements in #583 (@SiddhantBahuguna)
Speeds up CTC decoding in PyTorch by x10 in #633 (@fg-mindee)
Added train script for artefact detection in #593 (@SiddhantBahuguna)
Added GPU support for classification and improve memory pinning in #629 (@fg-mindee)
Added an object detection metric in #628 (@fg-mindee)
Split DocArtefacts into subsets and updated its class mapping in #601 (@fg-mindee)
Added README specific for the API with route examples in #612 (@fg-mindee)
Added SVT dataset integration in #620 (@felixdittrich92)
Added links to tutorial notebooks in the documentation in #619 (@fg-mindee)
Added new architectures to model selection in demo in #600 (@fg-mindee)
Add det/reco_predictor arch in OCRPredictor.__repr__ in #595 (@RBMindee)
Improves coverage by adding missing unittests in #545 (@fg-mindee)
Resolve both lines and blocks by default when building a doc in #548 (@charlesmindee)
Relocated test/ to tests/ and made contribution process easier in #598 (@fg-mindee)
Fixed Makefile by converting spaces to tabs in #615 (@fg-mindee)
Updated flake8 config to spot unused imports & undefined variables in #623 (@fg-mindee)
Adds 2 new rotation flags in the ocr_predictor in #632 (@charlesmindee)

Bug fixes

Fixed evaluation script clipping issue in #522 (@charlesmindee)
Fixed API template issues with new httpx version in #535 (@fg-mindee)
Fixed TransformerDecoder for PyTorch 1.10 in #539 (@fg-mindee)
Fixed a bug in resolve_lines in #537 (@charlesmindee)
Fixed target computation in MASTER model (PyTorch backend) in #546 (@charlesmindee)
Fixed portuguese entry in VOCAB in #571 (@fmobrj)
Fixed header check typo in #557 (@fg-mindee)
Fixed keras version constraint in #579 (@fg-mindee)
Updated streamlit version in demo app in #611 (@charlesmindee)
Updated environment collection script in #575 (@fg-mindee)
Removed console print in builder in #566 (@fg-mindee)
Fixed docstring and export as xml dim bug in #586 (@felixdittrich92)
Fixed README instruction for page synthesis in #590 (@fg-mindee)
Adds missing console log and removed Tensorboard in #626 (@fg-mindee)
Fixed docstrings of datasets in #603 (@felixdittrich92)
Fixed documentation build requirements in #549 (@fg-mindee)

Improvements

Applied post release modifications in #520 (@fg-mindee)
Updated benchmark entry of crnn_mobilenet_v3 small in #523 (@charlesmindee)
Updated perf crnn_mobilenet_v3_large performances in doc (TF) in #526 (@charlesmindee)
Added automatic detection of rotated bbox in training utils in #534 (@fg-mindee)
Cleaned rotation transforms in #536 (@fg-mindee)
Updated library name spelling in #541 (@fg-mindee)
Updates README of detection training in #542 (@k-for-code)
Updated package index in #543 (@fg-mindee)
Updated README in #555 (@fg-mindee)
Updated CONTRIBUTING and issue templates in #560 (@fg-mindee)
Removed unused imports and prevents XML attacks in #582 (@fg-mindee)
Updated references to demo in README in #599 (@fg-mindee)
Updated readme and help in analyze.py in #596 (@RBMindee)
Specified that the API template only supports images for now in #609 (@fg-mindee)
Updated command to install tf/pytorch build in #614 (@charlesmindee)
Added checkpoint format to gitignore in #613 (@fg-mindee)
Specified comment in SAR about symbol encoding in #617 (@fg-mindee)
Drops support of np.float16 in #627 (@fg-mindee)

New Contributors

Our thanks & warm welcome to the following persons for their first contributions: @mzeidhassan @k-for-code @felixdittrich92 @SiddhantBahuguna @RBMindee @thentgesMindee :pray:

Full Changelog: https://github.com/mindee/doctr/compare/v0.4.0...v0.4.1

v0.4.0

2 years ago

This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.

Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

No more width limitation for text recognition

Some documents such as French ID card include very long strings that can be challenging to transcribe:

fr_id_card_sample (copy)

This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.

The following snippet:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])

used to yield:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='1XXXXXX', confidence=0.0023),
        Word(value='1XXXX', confidence=0.0018),
      ]
    )]
    (artefacts): []
  )]
)

and now yields:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
        Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
      ]
    )]
    (artefacts): []
  )]
)

Framework specific predictors

PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified :raised_hands: Predictors are designed to be the recommended interface for inference with your models!

0.3.1 (TensorFlow)	0.3.1 (PyTorch)	0.4.0
`>>> from doctr.models import detection_predictor` `>>> predictor = detection_predictor(pretrained=True)` `>>> out = predictor(doc, training=False)`	`>>> from doctr.models import detection_predictor` `>>> import torch` `>>> predictor = detection_predictor(pretrained=True)` `>>> predictor.model.eval()` `>>> with torch.no_grad(): out = predictor(doc)`	`>>> from doctr.models import detection_predictor` `>>> predictor = detection_predictor(pretrained=True)` `>>> out = predictor(doc)`

An evergrowing model zoo :zebra:

As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:

db_mobilenet_v3_large
crnn_mobilenet_v3_small
crnn_mobilenet_v3_large

The full list of supported architectures is available :point_right: here

Demo live on HuggingFace Spaces

If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:

Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving :pray:

Breaking changes

Deprecated crnn_resnet31 & sar_vgg16_bn

After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31, sar_vgg16_bn.

Deprecated models.export

Since doctr.models.export was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.

New features

Datasets

Resources to access data in efficient ways

Added entry in vocabs for Portuguese #464 (@fmobrj), English, Spanish & German #467 (@fg-mindee), ancient Greek #500 (@fg-mindee)

IO

Features to manipulate input & outputs

Added .synthesize method to Page and Document #472 (@fg-mindee)

Models

Deep learning model building and inference

Add dynamic crop splitting for wide inputs to recognition models #465 (@charlesmindee)
Added MobileNets with rectangular pooling #483 (@fg-mindee)
Added pretrained params for db_mobilenet_v3_large #485 #487 , crnn_vgg16_bn #487, db_resnet50 #489, crnn_mobilenet_v3_small & crnn_mobilenet_v3_small #517 #516 (@charlesmindee)

Utils

Utility features relevant to the library use cases.

Added automatic font resolution function #472 (@fg-mindee)

Transforms

Data transformations operations

Added RandomCrop transformation #448 (@charlesmindee)

Test

Verifications of the package well-being before release

Added a unittest for RandomCrop #448 (@charlesmindee)
Added a unittest for crop split/merge in recognition models #465 (@charlesmindee)
Added unittests for PyTorch OCR model zoo #499 (@fg-mindee)

Documentation

Online resources for potential users

Added entry for RandomCrop #448 (@charlesmindee)
Added explanations about model export / compression #463 (@fg-mindee)
Added benchmark entry for db_mobilenet_v3_large #485 in the documentation (@charlesmindee)
Added badge with hyperlink to HuggingFace Spaces demo #501 (@osanseviero)

References

Reference training scripts

Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)

Others

Other tools and implementations

Added CI job to validate the demo, the evaluation script and the environment collection scripts #456 (@fg-mindee), the character classification training script #457 (@fg-mindee), the analysis & evaluation scripts in PyTorch #458 (@fg-mindee), the text recognition scripts #469 (@fg-mindee), the text detection scripts #491 (@fg-mindee)
Added support of PyTorch for the analysis & evaluation scripts #458 (@fg-mindee)

Bug fixes

Datasets

Fixed submodule import #451 (@fg-mindee )
Added missing characters in French vocab #467 (@fg-mindee)

Models

Fixed PyTorch preprocessor shape resolution #453 (@charlesmindee)
Fixed Tensor cropping for channels_first format #458 #461 (@fg-mindee)
Replaced recognition models' MobileNet backbones by their rectangular pooling counterparts #483 (@fg-mindee)
Fixed crop extraction for PyTorch tensors #484 (@charlesmindee)
Fixed crop filtering on multi-page inference #497 (@fg-mindee)

Transforms

Fixed rounding errors in RandomCrop #473 (@fg-mindee)

Utils

Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)

Documentation

Fixed READMEs of training scripts #504 #491 (@fg-mindee)

References

Fixed the requirements of the training scripts #494 #491 (@fg-mindee)

Others

Fixed the requirements of the streamlit demo #492 (@osanseviero), the API template #494 (@fg-mindee)

Improvements

Datasets

Merged DocDataset & OCRDataset #474 (@charlesmindee)
Updated DetectionDataset label format #491 (@fg-mindee)

Models

Deprecated doctr.models.export #463 (@fg-mindee)
Deprecated crnn_resnet31 & sar_vgg16_bn recognition models #468 (@fg-mindee)
Relocated DocumentBuilder to doctr.models.builder, split predictor into framework-specific objects #481 (@fg-mindee)
Added more robust argument checks in DocumentBuilder & refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee)
Reflected changes of detection target formats on detection models #491 (@fg-mindee)

Utils

Improved page synthesis with dynamic font size #472 (@fg-mindee)

Documentation

Updated README badge & added release-specific documentation index #451 (@fg-mindee)
Added logo in README & documentation #459 (@charlesmindee)
Updated hyperlink to documentation in the README #462 (@fg-mindee)
Updated vocab description in the documentation #467 (@fg-mindee)
Added favicon in the documentation #466 (@fg-mindee)
Removed benchmark entry of deprecated models #468 (@fg-mindee)
Updated README of the text recognition training script #469 (@fg-mindee)
Updated performance benchmark with crop splitting #471 (@charlesmindee)
Added page synthesis example in README #472 (@fg-mindee)
Made copyright mention dynamic, improved the landing & installation pages in the documentation #475 (@fg-mindee)
Restructured the documentation #519 (@fg-mindee)

Tests

Removed legacy unittests of doctr.models.export #463 (@fg-mindee)
Removed unittests for deprecated models #468 (@fg-mindee)
Updated unittests with the new doctr.utils.font submodule #472 (@fg-mindee)
Reflected changes from predictor refactor #481 (@fg-mindee)
Extended unittest of crop extraction #484 (@charlesmindee)
Reflected changes from predictor crop preparation improvement #497 (@fg-mindee)
Reflect changes from detection target format #491 (@fg-mindee)

References

Reflected changes of detection dataset target format #491 (@fg-mindee)

Others

Specified import of file_utils #447 (@zalakbhalani)
Updated package version #451 (@fg-mindee)
Updated PIL version constraint to fix vulnerability #460 (@fg-mindee)
Updated model selection in the demo #468 (@fg-mindee)
Removed some MacOS CI jobs that were slowing down PR checks #470 (@fg-mindee)
Reflected page synthesis changes in demo #477 (@fg-mindee)
Reflected changes from predictor refactor in API & demo #481 (@fg-mindee)
Updated author_email in setup #493 (@fg-mindee)
Split CI jobs for pytest in common, pytorch & tensorflow #498 #503 #506 (@fg-mindee)
Removed unused imports #507 (@fg-mindee)

Many thanks to our contributors, we are delighted to see that there are more every week!

v0.3.1

2 years ago

This release stabilizes the support for PyTorch backend while extending the range features (new task, superior pretrained models, speed ups).

Brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.3.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

Improved pretrained parameters for your favorite models :rocket:

Which each release, we hope to bring you improved models and more comprehensive evaluation results. As part of the 0.3.1 release, we provide you with:

improved params for crnn_vgg16_bn & sar_resnet31
evaluation results on a new private dataset (US tax forms)

Lighter backbones for faster architectures :zap:

Without any surprise, just like many other libraries, DocTR's future will involve some balance between speed and pure performance. To make this choice available to you, we added support of MobileNet V3 and pretrained it for character classification for both PyTorch & TensorFlow.

Speeding up preprocessors & datasets :train2:

Whether you are a user looking for inference speed, or a dedicated model trainer looking for optimal data loading, you will be thrilled to know that we have greatly improved our data loading/processing by leveraging multi-threading!

Better demo app :art:

We value the accessibility of this project and thus commit to improving tools for entry-level users. Deploying a demo from a Python library is not the expertise of every developer, so this release improves the existing demo:

new_demo

Page selection was added for multi-page documents, the predictions are used to produce a synthesized version of the initial document, and you get the JSON export! We're looking forward to your feedback :hugs:

[beta] Character classification

As DocTR continues to move forward with more complex tasks, paving the way for a consistent training procedure will become necessary. Pretraining has shown potential in many deep learning tasks, and we want to explore opportunities to make training for OCR even more accessible. char_classif

So this release makes a big step forward by adding on-the-fly character generator and training scripts, which allows you to train a character classifier without any pre-existing data :hushed:

Breaking changes

Default dtype of TF datasets

In order to harmonize data processing between frameworks, the default data type of dataloaders has been switched to float32 for TensorFlow backend:

0.3.0	0.3.1
`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD()` `>>> img, target = ds[0]` `>>> print(img.dtype)` `<dtype: 'uint8'>` `>>> print(img.numpy().min(), img.numpy().max())` `0 255`	`>>> from doctr.datasets import FUNSD` `>>> ds = FUNSD()` `>>> img, target = ds[0]` `>>> print(img.dtype)` `<dtype: 'float32'>` `>>> print(img.numpy().min(), img.numpy().max())` `0.0 1.0`

I/O module

Whether it is for exporting predictions or loading input data, the library lets you play around with inputs and outputs using minimal code. Since its usage is constantly expanding, the doctr.documents module was repurposed into doctr.io.

0.3.0	0.3.1
`>>> from doctr.documents import DocumentFile` `>>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()`	`>>> from doctr.io import DocumentFile` `>>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()`

It now also includes an image submodule for easy tensor <--> numpy conversion for all supported data types.

Multithreading relocated

As multithreading is getting increasingly used to boost performances in the entire library, it has been moved from utilities of TF-only datasets to doctr.utils.multithreading:

0.3.0	0.3.1
`>>> from doctr.datasets.multithreading import multithread_exec` `>>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])`	`>>> from doctr.utils.multithreading import multithread_exec` `>>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8])`

New features

Datasets

Resources to access data in efficient ways

Added support of FP16 (#367)
Added option to merge subsets for recognition datasets (#376)
Added dynamic sequence encoding (#393)
Added support of new label format datasets (#407)
Added character generator dataset for image classification (#412, #418)

IO

Features to manipulate input & outputs

Added Element creation from dictionary (#386)
Added byte decoding function for PyTorch and TF (#390)
Added extra tensor conversion functions (#412)

Models

Deep learning model building and inference

Added crnn_resnet31 as a recognition model (#361)
Added a uniform comprehensive preprocessing mechanism for both frameworks (#370)
Added support of FP16 (#382)
Added MobileNet V3 for TensorFlow as a backbone (#372, #410, #420)
Added superior pretrained params for crnn_vgg16_bn in TF (#395)
Added pretrained params for master in TF (#396)
Added mobilenet backbone availability to detection & recognition models (#398, #399)
Added pretrained params for mobilenets on character classification (#415, #421, #424)
Added superior pretrained params for sar_resnet31 in TF (#395)

Utils

Utility features relevant to the library use cases.

Added box rotation function (#358)
Added box visualization feature (#384)

Transforms

Data transformations operations

Added rotate function (#358) and its corresponding augmentation module (#363)
Added cropping function (#366)
Added support of FP16 (#388)

Test

Verifications of the package well-being before release

Added unittests for rotation functions (#358)
Added test cases for recognition zoo (#361)
Added unittests for RandomRotate (#363)
Added test cases for recognition dataset merging (#376)
Added unittests for Element creation from dicts (#386)
Added test case for mobilenet backbone (#372)
Added unittests for datasets with new format (#407)
Added test cases for the character generator (#412)

Documentation

Online resources for potential users

Added entry for RandomRotate (#363)
Added entry for CharacterGenerator (#412)
Added evaluation on US tax forms in the documentation (#419)

References

Reference training scripts

Added PyTorch training reference scripts (#359, #394)
Added LR scheduler to TensorFlow script (#360, #374, #397) & Pytorch scripts (#381)
Added possibility to use multi-folder datasets (#377)
Added character classification training script (#414, #420)

Others

Other tools and implementations

Added page selection and result JSON display in demo (#369)
Added an entry for MASTER in model selection of the demo (#400)

Bug fixes

Datasets

Fixed image shape resolution in custom datasets (#354)
Fixed box clipping to avoid rounding errors in datasets (#355)

Models

Fixed GPU compatibility of detection models (#359) & recognition models (#361)
Fixed recognition model loss computation in PyTorch (#379)
Fixed loss computation of CRNN in PyTorch (#434)
Fixed loss computation of MASTER in PyTorch (#440)

Transforms

Fixed Resize transformation when aspect ratio matches the target (#357)
Fixed box rotation (#378)
Fixed image expansion while rotating (#438)

Documentation

Fixed installation instructions (#437)

References

Fixed missing import in utils (#389)
Fixed GPU support for PyTorch in the recognition script (#427)
Fixed gradient clipping in Pytorch scripts (#432)

Others

Fixed trigger of script testing CI job (#351)
Constrained PIL version due to issues with version 8.3 (#362)
Added missing mypy config ignore (#365)
Fixed PDF page rendering for demo & analysis script (#368)
Constrained weasyprint version due to issues with version 53.0 (#404)
Constrained matplotlib version due to issues with version 3.4.3 (#413)

Improvements

Datasets

Improved typing of doctr.datasets (#354)
Improved PyTorch data loading (#362)
Switched default dtype of TF to tf.float32 instead of tf.uint8 (#367, #375)
Optimized sequence encoding throught multithreading (#393)

IO

Relocated doctr.documents to doctr.io (#390)

Models

Updated bottleneck channel multiplier in MASTER (#350)
Added dropout in MASTER (#349)
Renamed MASTER attribute for consistency across models (#356)
Added target validation for detection models (#355)
Optimized preprocessing by leveraging multithreading (#370)
Added dtype dynamic resolution and specified error messages (#382)
Harmonized parameter loading in PyTorch (#425)
Enabled backbone pretraining in complex architectures (#435)
Made head & FPN sizing dynamic using the feature extractor in detection models (#435)

Utils

Moved multithreading to doctr.utils (#371)
Improved format validation for visualization features (#392)
Moved doctr.models._utils.rotate_page to doctr.utils.geometry.rotate_image (#371)

Documentation

Updated pypi badge & documentation changelog (#346)
Added export example in documentation (#348)
Reflected relocation of doctr.documents to doctr.io in documentation and README (#390)
Updated recognition benchmark (#395, #441)
Updated model entries (#435)
Updated authors & maintainer references in setup.py and in README (#444)

Tests

Added test case for same aspect ratio resizing (#357)
Extended testing of datasets (#354)
Added test cases for FP16 support of datasets (#367), models (#382), transforms (#388)
Moved multithreading unittests (#371)
Extended test cases of preprocessors (#370)
Reflected relocation of doctr.documents to doctr.io (#390)
Removed unused imports (#391)
Extended test cases for visualization (#392)
Updated unittests of sequence encoding (#393)
Added extra test cases for rotation validation (#438)

References

Added optimal selection of workers for all scripts (#362)
Reflected changes of switching to tf.float32 by default for datasets (#367)
Removed legacy script arg (#380)
Removed unused imports (#391)
Improved device selection for Pytorch training script (#427)
Improved metric logging when the value is undefined (#432)

Others

Updated package version (#346)

v0.3.0

2 years ago

This release adds support for PyTorch backend & rotated text elements.

Release brought to you by @fg-mindee & @charlesmindee

Note: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

[beta] Welcome PyTorch :tada:

This release comes with exciting news: we added support of PyTorch for the whole library!

If you have both TensorFlow & Pytorch, simply switch DocTR backend by using the USE_TORCH and USE_TF environment variables.

export USE_TORCH='1'

Then DocTR will do the rest for you to play along with PyTorch:

import torch
from doctr.models import db_resnet50
model = db_resnet50(pretrained=True).eval()
with torch.no_grad():
    out = model(torch.rand(1, 3, 1024, 1024))

More pretrained models to come in the next releases!

Support of rotated boxes

Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

Rotated bounding boxes

Page reconstruction

Following up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature.

import matplotlib.pyplot as plt
from doctr.utils.visualization import synthesize_page
from doctr.documents import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Analyze
result = model(doc)

# Reconstruct the first page
reconstructed_page = synthesize_page(result.export()[0])
plt.imshow(reconstructed_page); plt.show()

Original image Page reconstruction

Using the predictions from our models, we try to synthesize the document with only its textual information!

Breaking changes

Renamed LinkNet

While the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed linknet into linknet16

0.2.1	0.3.0
`>>> from doctr.models import linknet` `>>> model = linknet(pretrained=True)`	`>>> from doctr.models import linknet16` `>>> model = linknet16(pretrained=True)`

New features

Datasets

Resources to access data in efficient ways

Added option to yield rotated bounding boxes as target (#281)
Added support of PyTorch for all datasets (#319)

Documents

Features to manipulate document information

Added support of rotated bboxes (#281)
Added entry for MASTER (#300)
Updated LinkNet entry (#313)
Added code of conduct (#325)

Models

Deep learning model building and inference

Added rotated cropping feature & inference mode (#281)
Added spatial masked loss support for LinkNet (#296)
Added page orientation estimation feature (#293)
Added box target rotation feature (#297)
Added support of MASTER recognition model & transformer (#300, #342)
Added Focal loss support to linknet (#304, #311)
Added PyTorch support for DBNet (#310, #313, #316), LinkNet (#317), conv_sequence & parameter loading (#323), resnet31 (#327), vgg16_bn (#328), CRNN (#318), SAR (#333), MASTER (#329, #335, #340, #342)
Added cleaner verified file downloading function (#319)
Added upfront page orientation estimation (#324) by @Rob192

Utils

Utility features relevant to the library use cases.

Added Mask IoU computation (#290)
Added straight <--> rotated bbox conversion and metric computation support (#281)
Added page synthesis feature (#320)
Added IoA, and NMS (#332)

Transforms

Data transformations operations

Added support of custom Resize in PyTorch (#313), ColorInversion (#322)

Test

Verifications of the package well-being before release

Added unittest for maks IoU computation (#290)
Added unittests for rotated bbox support (#281, #297)
Added unittests for page orientation estimation (#293, #324)
Added unittests for MASTER (#300, #309)
Added test case for the focal loss of LinkNet (#304)
Added unittests for Pytorch integration (#310, #313, #317, #319, #322, #323, #327, #318, #329, #335, #340, #342)
Added unittests for IoA & NMS (#332)

Documentation

Online resources for potential users

Added instructions to install DocTR with PyTorch or TF (#306)
Added specific instructions to run checks in CONTRIBUTING (#321)

References

Reference training scripts

Added support of rotated bounding box targets (#281)

Others

Other tools and implementations

Added support of rotated bounding box target & inference mode (#281)
Added framework availability check (#306, #314, #315)
Added CI job for pytorch unittests (#310)
Added CI jobs to build DocTR with multiple python version, environment and framework (#314, #315)
Updated demo to add page reconstruction (#320)
Added PyTorch & torchvision to environment collection script (#345) & updated the bug template

Bug fixes

Documentation

Fixed entry of datasets (#344)

Tests

Fixed ColorInversion unittest (#298, #339)

References

Fixed missing import of wandb in the detection script (#288)
Fixed edge case of recognition model output unpacking in the recognition training script (#291)
Fixed model output unpacking in the detection script (#301)
Fixed wandb config for training scripts (#302)

Others

Fixed edge case of recognition model output unpacking in the evaluation script (#291)
Fixed mypy config and related typing annotations (#308, #312, #314, #336)

Improvements

Datasets

Improved constructors of OCRDataset and CORD (#289, #299)
Silenced numpy dtype warnings (#336)

Documents

Updated README badge & documentation versioning (#287)
Harmonized benchmark table formatting of figures (#281)
Updated demo illustration in README (#326)

Documentation

Updated documentation font and mentioned PyTorch support in README & docs (#344)

Tests

Updated unittest image (#337)
Cleaned up unittest folder separation (#338)

References

Reordered script option to save time for test-only (#294)

Others

Updated package version (#287)
Removed unused imports (#295, #307, #336)
Updated API requirements for security and cleaned Dockerfile (#303)
Improved setuptools classifiers and installation process (#306)

:pray: Thanks to our contributors :pray: @Rob192