Classifier Reborn Versions Save

A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.

v2.3.0

1 year ago

Minor Enhancements

  • Separate tokenizer from hasher, allowing custom tokenizers. (#162)
  • Improved handling of Zero division and Zero vector normalization. (#173)
  • Support Numo Gem for performing SVD (#198)

Development Fixes

  • remove deprecated has_rdoc in gemspec
  • auto-gen-config for Rubocop
  • auto-correct offenses
  • Freeze all Ruby string literals (#190)
  • Migrate TravisCI to GitHub Actions & Update Tested Ruby Versions (#195)
  • Test Native and GSL Implementations (#196)
  • Actually Use GSL in CI Tests (#197)

Documentation

  • Updated Bayes docs for filters (#191)

v2.2.0

6 years ago

Major Enhancements

  • JRuby Support, thanks to @mach-kernel (#168)
  • Add support to reset trained classifiers to their initial state (#143)
  • Classifier evaluation and validation (#142)
  • Abbility to add custom stopwords at classifier initialization (#129)
  • Don't train/untrain the Bayesian classifier with empty word hashes (#132)
  • Enable auto categorization if no initial categories (#128)
  • Bayes integration test of Memory and Redis backends with real data (#92)
  • Memory and Redis backend support (#84)

Minor Enhancements

  • improved turkish stopwords (#159)
  • Set Redis keys only if they don't exist (#156)
  • Require bayes_redis_backend (#157)
  • Validation documentation improvements (#150)
  • Updated Dokcer image to Ruby 2.4 (#149)
  • Classifier validation user documenation (#145)
  • Fixed persistance for BayesMemoryBackend (#147)
  • Fixed error on requiring 'classifier-reborn' without using Redis (#146)
  • Removed magic train untrain methods from docs, (#141)
  • Links corrected to point to the new domain (#139)
  • Minor docs improvements (#138)
  • Return the status of the training/untraining when run (#137)
  • Refactoring of backend tests to move duplicate login in the common file (#134)
  • Deal with Infinity score in test (#133)
  • README file cleaned up to point to the documentation site (#121)
  • Added and corrected RDoc for ceratin classes and methods (#122)
  • Added favicon link and forced display (#120)
  • Updated the truncated LICENSE file (#116)
  • Docs visual improvement and refactoring (#119)
  • Fixed relative URL issue on nav links and added benchmark data (#118)
  • Added custom layout with navigation (#117)
  • Created a static site for documentation (#115)
  • Removed redis gem from Dockerfile as it is added in gemspec (#113)
  • Speed up Docker image rebilding (#112)
  • Improved Docker based development documentation (#106)
  • Benchmark refactoring, improving efficiency, enhanced reporting (#107)
  • Add Vietnamese stopwords (#110)
  • Added stop words for Arabic, Bengali, Chinese, Hindi, and Russian (#105)
  • Dockerfile and documentation (#104)
  • Remove hard dep on Redis and update bin (#96)
  • Documented Redis backend performance (#103)
  • Rename Bayes memory test class (#102)
  • Added Bayes backend benchmarks (#98)
  • Disabled Redis disc persistence and refactored integration test (#97)
  • Removed useless intermediate variables (#90)

v2.1.0

7 years ago

Major Enhancements

  • Fix breaking changes in LSI api. Displays errors instead of raising where possible. #87

v2.0.5

7 years ago

Major Enhancements

  • Stopwords get encoded to utf8 (#83)
  • Fix searching issues where no document is added to lsi (#77)
  • Added method to add custom path to user-created stopword directory (#73)

Minor Enhancements

  • Test newer rubies (#85)
  • Fixed errors in README (#68, #79, #80)
  • Added an option to the bayesian classifier to disable word stemming (#61)
  • Added missing parens and renamed some variables (#59)

2.0.4

8 years ago

Major Enhancements

  • Classification thresholds can be enabled or disabled. The default is disabled. The threshold value can be set at initialization time or dynamically during processing (#47)
  • Made auto-categorization optional, defaulting to false (#45)
  • Added the ability to handle an array of classifications to the constructor (#44)
  • Classification with a threshold has been added to the api (#39)

Minor Enhancements

  • Documentation around threshold usage (#54)
  • Fixed UTF-8 encoding for hasher.rb (#50)
  • Removed some unnecessary methods (#43)
  • Add optional CachedContentNode (GSL only) (#43)
  • Caches the transposed search_vector (#43)
  • Added custom marshal_ methods to not save the cache when dumping/loading (#43)
  • Optimized some numeric comparisons and iterators (#43)
  • Added cached calculation table when computing raw_vectors (#43)
  • If a category name is already a symbol, just return it (#45)
  • Various Hash improvements (#45)
  • Eliminated several Ruby :warning:s when run with RUBYOPT="-w" (#38)
  • Simple performance improvements for the Hasher process (#41)
  • Fixes for broken regex splitting for non-ascii characters and removal of the unused punctuation filter (#41)
  • Add multiple language stopwords with customizable stop word paths (#40)

Bug Fixes

  • Fixed the bug where adding the same category a second time would clobber the category that was already there (#45)
  • Fixed deprecation warning for <=> in ls.rb (#33)
  • Remove references to Madeline in the README and replace it with Marshal or Redis (#32)

Development Fixes

  • Added development dependency on mini_test and added 2.2 to travis.yml (#36)

v2.0.2

9 years ago

Minor Enhancements

  • Remove Array#sum monkey patch in favour of #reduce(0, :+) (#20)
  • Cache total word counts per category for speed (#4)

Development Fixes

  • Add a test for Bayes#untrain_*. (#21)
  • Fix link to rb-gsl gem (#24)
  • Add helper scripts per Jekyll convention (#25)

Many thanks to @Ch4s3 for all his work on this release!

v2.0.1

9 years ago

Bug Fixes

  • Count total unique words using methods supported by Vector and GSL::Vector (#11)

v2.0.0

9 years ago

Bug Fixes

  • Remove mathn dependency (#8)
  • Only perform first order transform if total UNIQUE words is greater than 1 (#3)
  • Update LSI#remove_item such that they will work with the @items hash. (#2)

Development Fixes

  • Exclude Gemfile.lock in .gitignore (#7)