Read and extract text and other content from PDFs in C# (port of PDFBox)
A curated list of resources for Document Understanding (DU) topic
This repository provides train&test code, dataset, det.&rec. annotation...
Official PyTorch implementation of LiLT: A Simple yet Effective Language...
RObust document image BINarization
Document Visual Question Answering
(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical do...
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointl...
Improving Document Binarization via Adversarial Noise-Texture Augmentati...