Best 13 Extract Text Open Source Projects

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx,...

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

:warning: ARCHIVED :warning: Search across and get full text for OA & cl...

Python based Open Source ETL tools for file crawling, document processin...

Use the Java Tika text extraction library on the .NET platform

Multiple and Large PDF Documents Text Extraction.

Extract text from plaintext, .docx, .odt and .rtf files. Pure go.

Read pdf files on javascript

C# and VB.NET samples for Docotic.Pdf library

R wrapper for antiword utility

R Interface to Apache Tika

Build search across multiple documents client-side in your file storage

simple rule based named entity recognition