Hazm Versions Save

Persian NLP Toolkit

v0.10.0

3 months ago
  • Added SpacyPOSTagger class for utilizing the hazm deep learning transformer-based model in POS tagging. @MortezaMahdaviMortazavi
  • Added SpacyChunker class for leveraging the hazm deep learning transformer-based model in chunking. @MortezaMahdaviMortazavi
  • Added SpacyDependencyParser class for employing the hazm deep learning transformer-based model in dependency parsing. @MortezaMahdaviMortazavi
  • Added 160,000 new words to improve normalizer and lemmatizer. @sir-kokabi
  • Added FaSpellReader to read FAspell corpus. @sir-kokabi
  • Added ArmanReader to read ArmanPersoNERCorpus. @sir-kokabi
  • Added PnSummaryReader to read pn-summary corpus. @sir-kokabi
  • Removed unnecessary old Stanford dependencies.. @sir-kokabi

Download pretrained-models

Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9.4...v0.10.0

v0.9.4

7 months ago
  • Added join_abbreviations to skip abbrs tokenizing using ParsiNorm's abbreviation lists. #216 @optimopium @sir-kokabi.
  • Added MizanReader to read Mizan corpus. @sir-kokabi.
  • Added NaabReader to read Naab corpus. @sir-kokabi.
  • Added NerReader to read NER corpus. @sir-kokabi.
  • Improved Normalizer by adding support for normalizing words with the suffix 'هایی'. @sir-kokabi.
  • Fixed #298: Incompatibility issues with numpy. @mhdi707 @sir-kokabi

Download pretrained-models

Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9.3...v0.9.4

v0.9.3

9 months ago

Fixed

  • Fix critical bug in Lemmatizer that caused incorrect lemmatization of certain words. @sir-kokabi.
  • Fix bug caused WikipediaReader to not work as before #287. @sir-kokabi.
  • Fix missing imports for WikipediaReaderand PersianPlainTextReader #286. @sir-kokabi.
  • Fix some issues in the demo to make it compatible with the latest version of Hazm. @sir-kokabi.
  • Fix a few issues related to tests and mkdocs build. @sir-kokabi.
  • Improve documentation. @sir-kokabi.
  • improve dependency tree visualization on the demo page. @sir-kokabi.

Download pretrained-models

Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9.2...v0.9.3

v0.9.2

10 months ago

Added

  • Add pretrained DependencyParser models. @E-Ghafour.
  • Add UniversalDadeganReader class for process and read Universal Persian Dependency Treebank corpus. @E-Ghafour, @imani.
  • Add 400+ new words to improve Normalizer, Lemmatizer and Tokenizer. @sir-kokabi.

Fixed

  • Fix DependencyParser issue #282. @E-Ghafour, @imani.
  • Fix Some tests issues. @E-Ghafour.

Download pretrained-models

Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9...v0.9.2

v0.9

11 months ago

Added

  • Windows compaitiblity by using Python-crfsuite instead of Wapiti. @E-Ghafour.
  • Pretrained Chunker and POSTagger models with Python-crfsuite. @E-Ghafour.
  • new parameters in Normalizer to better text processing. @sir-kokabi.
  • Three regex patterns in Normalizer to fix ZWNJs and spacing issues. @sir-kokabi.
  • 400 Non-standard unicode characters to be replaced in Normalizer. @sir-kokabi.
  • 40,000+ new words to improve Lemmatizer and Tokenizer. @sir-kokabi.
  • train function for Word2vec and Sent2vec modules in Embedding. @E-Ghafour.
  • Implement keywordExtraction with the embedRank approach as a sample of Hazm usage. @E-Ghafour.
  • Support Universal tags in POSTagger. @E-Ghafour.
  • Support universal POS mapper in PeykareReader & DadeganReader (#239). @phsfr.
  • PersianPlainTextReader to process raw text datasets (#120). @mhbashari.
  • Support EZ tag in PeykareReader. @E-Ghafour.
  • Slash & back-slash (/ ) support in Tokenizer (#102). @elahimanesh.
  • Conjugation class to handle verb conjugation. @sir-kokabi.

Fixed

  • Improve the accuracy of POSTagger and Chunker. @E-Ghafour.
  • Improve InformalNormalizer #219. @riasati.
  • Fix pep8 issues. (#135). @hadifar.
  • Fix Some tests issues. @sir-kokabi @E-Ghafour.
  • Fix Stemmer issues with multiple suffixes. @sir-kokabi.
  • Fix various reported issues

Changed

  • Drop Python 2 support and migrate all code to Python 3. @sir-kokabi.
  • Use data_maker function instead of patterns in SequenceTagger. @E-Ghafour.
  • Refactor IOBTagger and POSTagger to be compatible with data_maker. @E_Ghafour.
  • Change می روم to می‌روم in example (#203). @SMSadegh19.
  • Overhaul the project structure and GitHub repo. @sir-kokabi.

Download Pretrained models

Full Changelog: https://github.com/roshan-research/hazm/compare/v0.8.2...v0.9

v0.8.2

1 year ago

Release notes:

  • Add WordEmbedding (Download the pre-trained model(Fasttext) from here)
  • Add SentenceEmbedding (Download the pre-trained model from here)
  • Add Documentation webpage (link)
  • Improve normalizer, informal normalizer, and tokenizer
  • Add Degarbayan and MirasText corpus reader

What's Changed

New Contributors

Full Changelog: https://github.com/sobhe/hazm/compare/v0.7...v0.8.2

v0.7

5 years ago

v0.5

9 years ago

v0.4

9 years ago

v0.3

9 years ago