The Archives Unleashed Toolkit is an open-source toolkit for analyzing w...
PDF Reader Library for Native Julia.
Apache Tika bindings for PHP: extract text and metadata from documents, ...
Simple app to extract text from pictures using Tesseract
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
:book: Labeled examples from wiki dumps in Python
A project about benchmarking and evaluating existing PDF extraction tool...
Text extraction for Wagtail document search