Best 24 Text Extraction Open Source Projects

The Archives Unleashed Toolkit is an open-source toolkit for analyzing w...

PDF Reader Library for Native Julia.

Apache Tika bindings for PHP: extract text and metadata from documents, ...

Simple app to extract text from pictures using Tesseract

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

:book: Labeled examples from wiki dumps in Python

A project about benchmarking and evaluating existing PDF extraction tool...

Text extraction for Wagtail document search