extract text from any document. no muss. no fuss.
Several updates. See changelog for details
fix the msg
parser and update the Travis CI build
update dependencies and make pocketsphinx
optional
documentation build fixes
psv/tsv parsers, user-provided filename extensions, audio parsing with pocketsphinx, and several other bug fixes
python 3 compatability, improved docx extraction, improved image extraction, and more.
pdf layout preservation, extensionless file support, and several :bug: fixes
Added .rtf and .msg support
Includes support for tiff files and a new --option/-O command line option to pass in arbitrary keyword arguments to parsers, like the language for tesseract OCR
support for a variety of formats, including audio (.wav, .mp3, .ogg), csv, scanned pdfs, and htm plus various bug fixes and internal improvements.