node.js module for extracting text from html, pdf, doc, docx, xls, xlsx,...
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
:warning: ARCHIVED :warning: Search across and get full text for OA & cl...
Python based Open Source ETL tools for file crawling, document processin...
Use the Java Tika text extraction library on the .NET platform
Multiple and Large PDF Documents Text Extraction.
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Read pdf files on javascript
C# and VB.NET samples for Docotic.Pdf library
R wrapper for antiword utility
R Interface to Apache Tika
Build search across multiple documents client-side in your file storage
simple rule based named entity recognition