Module for automatic summarization of text documents and HTML pages.
Reworked https://www.readability.com/ parsing library (now https://mercu...
Automatically extract the main text content (and more) from an HTML docu...
基于行块分布函数的通用网页正文抽取算法优化,Python实现