news-please - an integrated web crawler and information extractor for ne...
ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main c...
A fork of Dragnet that also extract author, headline, date, keywords fro...