Textract Versions Save

extract text from any document. no muss. no fuss.

v1.6.4

2 years ago

Several updates. See changelog for details

v1.6.3

4 years ago

fix the msg parser and update the Travis CI build

v1.6.2

4 years ago

update dependencies and make pocketsphinx optional

v1.6.1

6 years ago

documentation build fixes

v1.6.0

7 years ago

psv/tsv parsers, user-provided filename extensions, audio parsing with pocketsphinx, and several other bug fixes

v1.5.0

7 years ago

python 3 compatability, improved docx extraction, improved image extraction, and more.

v1.4.0

8 years ago

pdf layout preservation, extensionless file support, and several :bug: fixes

v1.3.0

8 years ago

Added .rtf and .msg support

v1.2.0

9 years ago

Includes support for tiff files and a new --option/-O command line option to pass in arbitrary keyword arguments to parsers, like the language for tesseract OCR

v1.1.0

9 years ago

support for a variety of formats, including audio (.wav, .mp3, .ogg), csv, scanned pdfs, and htm plus various bug fixes and internal improvements.