Lapis Bayes Versions Save

Naive Bayes classifier for use in Lua

v1.3.0

2 months ago

https://luarocks.org/modules/leafo/lapis-bayes

This update has no functional differences but includes refactors to internal function calls. If you are using the documented interface in the README then no changes are necessary.

Changes

  • Removed increment_text method from Categories model -- The Categories model has no ties to any particular classifier anymore. A classier or tokenizer instance must be used to convert text to a list of words.

  • Categoires:increment_words can also take an array of words, in addition to hash table format

  • Remove the lapis.bayes.tokenizer module, which included a global instance of a tokenizer. Use a classifier or tokenizer directly to tokenize text. The tokenize_text function previously available on this module is now the tokenize_text method of any classifier.

  • Added new methods to BaseClassifier class in lapis.bayes.classifiers.base. The base classifier now has methods that mirror the functions provided in the lapis.bayes module

    • Add find_word_classifications method that can be used to override the query for looking up the word counts
    • Add classify_text method
    • Add tokenize_text method
  • Add support for regconfig option to the PostgresTextTokenizer. Previously it was hard-coded to english.

  • Updates to test suite

Full Changelog: https://github.com/leafo/lapis-bayes/compare/v1.2.0...v1.3.0

v1.2.0

2 years ago

https://luarocks.org/modules/leafo/lapis-bayes

  • Update atomic operations to be compatible with modern versions of Lapis
  • Bump minimum Lapis version to 1.8.2

Full Changelog: https://github.com/leafo/lapis-bayes/compare/v1.1.0...v1.2.0

v1.1.0

3 years ago
  • Don't repeat word in query when counting tokens if it shows up multiple times
  • Word coverage is correct when there are duplicate words
  • Update CI to use github actions

v1.0.1

4 years ago
  • Use lapis' trim function instead of custom one

v1.0.0

4 years ago

The dev version has been in production for quite some time now so I'm packaging a 1.0 release

Recent updates

  • Added docs about tokenizers
  • Training categories can take an array of words to use directly as tokens
  • Update word insertion query to use insert on conflict update