Wikipedia2vec Versions Save

A tool for learning vector representations of words and entities from Wikipedia

v2.0.0

4 months ago
  • Enhanced the text extraction parser of Wikipedia pages
  • Enhanced the detection of category and disambiguation pages
  • Converted Cython’s *.pyx code to * .py by adopting pure Python mode
  • Added support of multi-step Wikipedia redirects
  • Fixed an issue related to mmap (#79)

v1.0.5

3 years ago
  • Fixed a bug in MentionDB occurred when case_sensitive=True (#67)

v1.0.4

4 years ago

Changelog:

  • Pages that belong to Module and TimedText namespaces are now ignored while creating DumpDB
  • Improved normalization rules of entity titles of Wikipedia links #36 #38
  • Fixed Jieba tokenizer #27