Lapis Bayes Versions Save

Naive Bayes classifier for use in Lua

v1.3.0

2 months ago

https://luarocks.org/modules/leafo/lapis-bayes

This update has no functional differences but includes refactors to internal function calls. If you are using the documented interface in the README then no changes are necessary.

Changes

Removed increment_text method from Categories model -- The Categories model has no ties to any particular classifier anymore. A classier or tokenizer instance must be used to convert text to a list of words.
Categoires:increment_words can also take an array of words, in addition to hash table format
Remove the lapis.bayes.tokenizer module, which included a global instance of a tokenizer. Use a classifier or tokenizer directly to tokenize text. The tokenize_text function previously available on this module is now the tokenize_text method of any classifier.
Added new methods to BaseClassifier class in lapis.bayes.classifiers.base. The base classifier now has methods that mirror the functions provided in the lapis.bayes module
- Add find_word_classifications method that can be used to override the query for looking up the word counts
- Add classify_text method
- Add tokenize_text method
Add support for regconfig option to the PostgresTextTokenizer. Previously it was hard-coded to english.
Updates to test suite

Full Changelog: https://github.com/leafo/lapis-bayes/compare/v1.2.0...v1.3.0

v1.2.0

2 years ago

https://luarocks.org/modules/leafo/lapis-bayes

Update atomic operations to be compatible with modern versions of Lapis
Bump minimum Lapis version to 1.8.2

Full Changelog: https://github.com/leafo/lapis-bayes/compare/v1.1.0...v1.2.0

v1.1.0

3 years ago

Don't repeat word in query when counting tokens if it shows up multiple times
Word coverage is correct when there are duplicate words
Update CI to use github actions

v1.0.1

4 years ago

Use lapis' trim function instead of custom one

v1.0.0

4 years ago

The dev version has been in production for quite some time now so I'm packaging a 1.0 release

Recent updates

Added docs about tokenizers
Training categories can take an array of words to use directly as tokens
Update word insertion query to use insert on conflict update