Persian NLP Toolkit
SpacyPOSTagger
class for utilizing the hazm deep learning transformer-based model in POS tagging. @MortezaMahdaviMortazaviSpacyChunker
class for leveraging the hazm deep learning transformer-based model in chunking. @MortezaMahdaviMortazaviSpacyDependencyParser
class for employing the hazm deep learning transformer-based model in dependency parsing. @MortezaMahdaviMortazavinormalizer
and lemmatizer
. @sir-kokabiFaSpellReader
to read FAspell corpus. @sir-kokabiArmanReader
to read ArmanPersoNERCorpus. @sir-kokabiPnSummaryReader
to read pn-summary corpus. @sir-kokabiFull Changelog: https://github.com/roshan-research/hazm/compare/v0.9.4...v0.10.0
join_abbreviations
to skip abbrs tokenizing using ParsiNorm's abbreviation lists. #216 @optimopium @sir-kokabi.MizanReader
to read Mizan corpus. @sir-kokabi.NaabReader
to read Naab corpus. @sir-kokabi.NerReader
to read NER corpus. @sir-kokabi.Normalizer
by adding support for normalizing words with the suffix 'هایی'. @sir-kokabi.Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9.3...v0.9.4
Lemmatizer
that caused incorrect lemmatization of certain words. @sir-kokabi.WikipediaReader
to not work as before #287. @sir-kokabi.WikipediaReader
and PersianPlainTextReader
#286. @sir-kokabi.Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9.2...v0.9.3
DependencyParser
models. @E-Ghafour.UniversalDadeganReader
class for process and read Universal Persian Dependency Treebank corpus. @E-Ghafour, @imani.Normalizer
, Lemmatizer
and Tokenizer
. @sir-kokabi.DependencyParser
issue #282. @E-Ghafour, @imani.Full Changelog: https://github.com/roshan-research/hazm/compare/v0.9...v0.9.2
Python-crfsuite
instead of Wapiti
. @E-Ghafour.Chunker
and POSTagger
models with Python-crfsuite
. @E-Ghafour.Normalizer
. @sir-kokabi.Lemmatizer
and Tokenizer
. @sir-kokabi.train
function for Word2vec
and Sent2vec
modules in Embedding
. @E-Ghafour.keywordExtraction
with the embedRank
approach as a sample of Hazm usage. @E-Ghafour.Universal tags
in POSTagger
. @E-Ghafour.PeykareReader
& DadeganReader
(#239). @phsfr.PersianPlainTextReader
to process raw text datasets (#120). @mhbashari.EZ
tag in PeykareReader
. @E-Ghafour.Tokenizer
(#102). @elahimanesh.Conjugation
class to handle verb conjugation. @sir-kokabi.POSTagger
and Chunker
. @E-Ghafour.InformalNormalizer
#219. @riasati.Stemmer
issues with multiple suffixes. @sir-kokabi.data_maker
function instead of patterns
in SequenceTagger
. @E-Ghafour.IOBTagger
and POSTagger
to be compatible with data_maker
. @E_Ghafour.Full Changelog: https://github.com/roshan-research/hazm/compare/v0.8.2...v0.9
Full Changelog: https://github.com/sobhe/hazm/compare/v0.7...v0.8.2