Tomotopy Versions Save

Python package of Tomoto, the Topic Modeling Tool

v0.12.7

4 months ago
  • New features
    • Added Topic Model Viewer tomotopy.viewer.open_viewer()
    • Optimized the performance of tomotopy.utils.Corpus.process()
  • Bug fixes
    • Document.span now returns the ranges in character unit, not in byte unit.

v0.12.6

5 months ago
  • New features
    • Added some convenience features to tomotopy.LDAModel.train and tomotopy.LDAModel.set_word_prior.
    • LDAModel.train now has new arguments callback, callback_interval and show_progres to monitor the training progress.
    • LDAModel.set_word_prior now can accept Dict[int, float] type as its argument prior.

v0.12.5

9 months ago
  • New features
    • Added support for Linux ARM64 architecture.

v0.12.4

1 year ago
  • New features
    • Added support for macOS ARM64 architecture.
    • Added support for Python3.11
  • Bug fixes
    • Fixed an issue where tomotopy.Document.get_sub_topic_dist() raises a bad argument exception.
    • Fixed an issue where exception raising sometimes causes crashes.

v0.12.3

1 year ago

New features

  • Now, inserting an empty document using tomotopy.LDAModel.add_doc() just ignores it instead of raising an exception. If the newly added argument ignore_empty_words is set to False, an exception is raised as before. (#161)
  • tomotopy.HDPModel.purge_dead_topics() method is added to remove non-live topics from the model. (#152)

Bug fixes

  • Fixed an issue that prevents setting user defined values for nuSq in tomotopy.SLDAModel (by @jucendrero). (#174)
  • Fixed an issue where tomotopy.utils.Coherence did not work for tomotopy.DTModel. (#164)
  • Fixed an issue that often crashed when calling make_doc() before calling train(). (#166)
  • Resolved the problem that the results of tomotopy.DMRModel and tomotopy.GDMRModel are different even when the seed is fixed. (#63)
  • The parameter optimization process of tomotopy.DMRModel and tomotopy.GDMRModel has been improved.
  • Fixed an issue that sometimes crashed when calling tomotopy.PTModel.copy().

v0.12.2

2 years ago
  • An issue where calling convert_to_lda of tomotopy.HDPModel with min_cf > 0, min_df > 0 or rm_top > 0 causes a crash has been fixed.
  • A new argument from_pseudo_doc is added to tomotopy.Document.get_topics and tomotopy.Document.get_topic_dist. This argument is only valid for documents of PTModel, it enables to control a source for computing topic distribution.
  • A default value for argument p of tomotopy.PTModel has been changed. The new default value is k * 10.
  • Using documents generated by make_doc without calling infer doesn't cause a crash anymore, but just print warning messages.
  • An issue where the internal C++ code isn't compiled at clang c++17 environment has been fixed.

v0.12.1

2 years ago
  • An issue where tomotopy.LDAModel.set_word_prior() causes a crash has been fixed.
  • Now tomotopy.LDAModel.perplexity and tomotopy.LDAModel.ll_per_word return the accurate value when TermWeight is not ONE.
  • tomotopy.LDAModel.used_vocab_weighted_freq was added, which returns term-weighted frequencies of words.
  • Now tomotopy.LDAModel.summary() shows not only the entropy of words, but also the entropy of term-weighted words.

v0.12.0

3 years ago
  • Now tomotopy.DMRModel and tomotopy.GDMRModel support multiple values of metadata (see https://github.com/bab2min/tomotopy/blob/main/examples/dmr_multi_label.py )
  • The performance of tomotopy.GDMRModel was improved.
  • A copy() method has been added for all topic models to do a deep copy.
  • An issue was fixed where words that are excluded from training (by min_cf, min_df) have incorrect topic id. Now all excluded words have -1 as topic id.
  • Now all exceptions and warnings that generated by tomotopy follow standard Python types.
  • Compiler requirements have been raised to C++14.

v0.11.1

3 years ago
  • A critical bug of asymmetric alphas was fixed. Due to this bug, version 0.11.0 has been removed from releases.

v0.11.0

3 years ago
  • A new topic model tomotopy.PTModel for short texts was added into the package.
  • An issue was fixed where tomotopy.HDPModel.infer causes a segmentation fault sometimes.
  • A mismatch of numpy API version was fixed.
  • Now asymmetric document-topic priors are supported.
  • Serializing topic models to bytes in memory is supported.
  • An argument normalize was added to get_topic_dist(), get_topic_word_dist() and get_sub_topic_dist() for controlling normalization of results.
  • Now tomotopy.DMRModel.lambdas and tomotopy.DMRModel.alpha give correct values.
  • Categorical metadata supports for tomotopy.GDMRModel were added (see https://github.com/bab2min/tomotopy/blob/main/examples/gdmr_both_categorical_and_numerical.py ).
  • Python3.5 support was dropped.