[ALL] Introduced APIs that return the struct of ImmutableSentencePieceText, which encodes string-token, id, and utf-8 byte offsets at once. New API is available both from C++ and Python.
[ALL] Allow tab ‘\t’ to be included in user defined symbols.
[ALL] Added NFKD normalization rule. NFKD rule is provided as a TSV file.
[ALL] Added option to emit unknown symbol instead of raw symbol.
[Python]: Batch encode/decode requests are performed in native multi-threads.
[Python]: Supports to pass a custom log stream during training.
[Python]: Adds module-level version variable: spm.__version__
[Python]: Creates wheel package of Mac universal binary.
Bug fixes & minor changes
Uses the efficient encoding algorithm by default. Removed the functionality to switch the Viterbi tokenization algorithm.
Make the output of Encode and 1-best from NBestEncode same.
Use std::string_view as much as possible.
[Python] Removed pip package for ppc64le and s390x architecture as cibuiltool doesn’t support them.
v0.1.96
2 years ago
Updates
Improves the performance of unigram training
Updated the nfkc normalization with the latest ICU module.
Stop handling zero-width-joiner string as whitespace.
New features
added new sampling algorithm without replacement.
added API for new sampling and perplexity calculation.
added allow_whitespace_only_pieces mode.
v0.1.95
3 years ago
Updates
support to build sentencepiece with the external (official) abseil library.
upgraded protobuf 3.14.0
changed the type of input_sentence_size from int32 to uint64.
v0.1.94
3 years ago
Updates
added SetRandomGeneratorSeed function to set the seed value for random generator. This can allow to make reproducible sampling.
Validate the range of the vocab id in Python module.
Change the directory arrangement of python module.
Added protobuf python module.
Bug fixes
Support to build python wheel from source package.
v0.1.93
3 years ago
Bug fix
Fixed the regression bug around the flag --minloglevel
Fixed minor bugs.
Updates
Used manylinux2014 to build pypi packages
Support arm64, ppc64le, s390x architectures in pypi packages