The most accurate natural language detection library for Rust, suitable for long and short text alike
The new method LanguageDetector.detect_multiple_languages_of()
has been introduced. It allows to detect multiple languages in mixed-language text. (#1)
The new method LanguageDetectorBuilder.with_low_accuracy_mode()
has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#119)
The new method LanguageDetector.compute_language_confidence()
has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#102)
The computation of the confidence values has been revised and the softmax function is now applied to the values, making them better comparable by behaving more like real probabilities. (#120)
The WASM API has been revised. Now it makes use of the same builder pattern as the Rust API. (#122)
The language model files are now compressed with the Brotli algorithm which reduces the file size by 15 %, on average. (#189)
The language model ngrams are now stored in a CompactString
type which reduces the amount of consumed memory by 20 %. (#198)
Several performance optimizations have been applied which makes the library nearly twice as fast as the previous version. Big thanks go out to @serega and @koute for their help. (#82, #148, #177)
The enums IsoCode639_1
and IsoCode639_3
now implement some new traits such as Copy
, Hash
and Serde's Serialize
and Deserialize
. The enum Language
now implements Copy
as well. (#175)
Language
, IsoCode639_1
and IsoCode639_3
now implement std::str::FromStr
in order to instantiate enum variants by string values. This comes in handy for JavaScript bindings and the like. (#15)