Extraction Toolkit
ETK is a Python library for high precision information extraction from many document formats. It proivdes a flexible framework of composable extractors that enables you to combine a host of predefined extractors provided in ETK with custom extractors that you may need to develop for your application. It supports extraction from HTML pages, text documents, CSV and Excel files and JSON documents. ETK is open-source software, released under the MIT license.
Read the documentation here