中文文本摘要/关键词提取
从中文文本中提取摘要及关键词,并对算法时间复杂度进行了修改,计算图最大权节点的时间复杂度由o(n^2)降低到了o(n)。在有限的测试文本上(10篇文章),其运行速度相比于textrank4zh这个包快了8倍。算法原理见知乎文章
Numpy>=1.14.5 gensim>=3.5.0 pip install FastTextRank==1.1
详情请见./FastTextRank/test文件夹
KeyWord.py:提取关键字示例
Sentence.py:提取摘要示例
如有优化点,欢迎pull requests
如有问题,欢迎提issues
Extract abstracts and keywords from Chinese text, use optimized iterative algorithms to improve running speed, and selectively use word vectors to improve accuracy.
PageRank is a website page ranking algorithm from Google.
PageRank was originally used to calculate the importance of web pages. The entire www can be seen as a directed graph, and the node is a web page.
This algorithm can caculate all node's importance by their connections.