Vnlp Versions Save

State-of-the-art, lightweight NLP tools for Turkish language. Developed by VNGRS.

1 year ago

cyhunspell is replaced by spylls. Consequently, VNLP now supports Python 3.10. However, Python3.6 support is dropped now.
Newer versions of Tensorflow does not rely on Keras-Preprocessing anymore. This had caused issues since our tokenizers were saved via pickle. Instead, they are stored as json now, and are loaded in a tf version agnostic way.
Tensorflow warnings are suppressed.
Readthedocs build and files are updated due to tensorboard, protobuf and grpcio dependency issues.

1 year ago

SentencePiece Unigram Context (SPUContext) models are added for Named Entity Recognition, Dependency Parsing, Part of Speech Tagging and Sentiment Analysis. These are the default models now.
SPUContext models are even more compact, up to 4x faster and perform significantly better. See metrics table on the main page for comparison.
SPUContext models use SentencePiece Unigram tokenization.
Wheel file is 80% smaller now, and each model downloads its weights when it is initialized for the first time.
In order to evaluate a DL based model, use "evaluate = True" flag while initializing, e.g., NamedEntityRecognizer(model = 'CharNER', evaluate = True). This will load the weights that are NOT trained with test sets.
Former Python API has become a generic user API, creating an abstraction for the implemented methods. Desired model can be initialized using the "model" argument, e.g., NamedEntityRecognizer(model = 'CharNER').