Tomotopy Versions Save

Python package of Tomoto, the Topic Modeling Tool

4 months ago

New features
- Added Topic Model Viewer tomotopy.viewer.open_viewer()
- Optimized the performance of tomotopy.utils.Corpus.process()
Bug fixes
- Document.span now returns the ranges in character unit, not in byte unit.

5 months ago

New features
- Added some convenience features to tomotopy.LDAModel.train and tomotopy.LDAModel.set_word_prior.
- LDAModel.train now has new arguments callback, callback_interval and show_progres to monitor the training progress.
- LDAModel.set_word_prior now can accept Dict[int, float] type as its argument prior.

9 months ago

1 year ago

New features
- Added support for macOS ARM64 architecture.
- Added support for Python3.11
Bug fixes
- Fixed an issue where tomotopy.Document.get_sub_topic_dist() raises a bad argument exception.
- Fixed an issue where exception raising sometimes causes crashes.

1 year ago

New features

Now, inserting an empty document using tomotopy.LDAModel.add_doc() just ignores it instead of raising an exception. If the newly added argument ignore_empty_words is set to False, an exception is raised as before. (#161)
tomotopy.HDPModel.purge_dead_topics() method is added to remove non-live topics from the model. (#152)

Fixed an issue that prevents setting user defined values for nuSq in tomotopy.SLDAModel (by @jucendrero). (#174)
Fixed an issue where tomotopy.utils.Coherence did not work for tomotopy.DTModel. (#164)
Fixed an issue that often crashed when calling make_doc() before calling train(). (#166)
Resolved the problem that the results of tomotopy.DMRModel and tomotopy.GDMRModel are different even when the seed is fixed. (#63)
The parameter optimization process of tomotopy.DMRModel and tomotopy.GDMRModel has been improved.
Fixed an issue that sometimes crashed when calling tomotopy.PTModel.copy().

2 years ago

An issue where calling convert_to_lda of tomotopy.HDPModel with min_cf > 0, min_df > 0 or rm_top > 0 causes a crash has been fixed.
A new argument from_pseudo_doc is added to tomotopy.Document.get_topics and tomotopy.Document.get_topic_dist. This argument is only valid for documents of PTModel, it enables to control a source for computing topic distribution.
A default value for argument p of tomotopy.PTModel has been changed. The new default value is k * 10.
Using documents generated by make_doc without calling infer doesn't cause a crash anymore, but just print warning messages.
An issue where the internal C++ code isn't compiled at clang c++17 environment has been fixed.

2 years ago

An issue where tomotopy.LDAModel.set_word_prior() causes a crash has been fixed.
Now tomotopy.LDAModel.perplexity and tomotopy.LDAModel.ll_per_word return the accurate value when TermWeight is not ONE.
tomotopy.LDAModel.used_vocab_weighted_freq was added, which returns term-weighted frequencies of words.
Now tomotopy.LDAModel.summary() shows not only the entropy of words, but also the entropy of term-weighted words.

3 years ago

Now tomotopy.DMRModel and tomotopy.GDMRModel support multiple values of metadata (see https://github.com/bab2min/tomotopy/blob/main/examples/dmr_multi_label.py )
The performance of tomotopy.GDMRModel was improved.
A copy() method has been added for all topic models to do a deep copy.
An issue was fixed where words that are excluded from training (by min_cf, min_df) have incorrect topic id. Now all excluded words have -1 as topic id.
Now all exceptions and warnings that generated by tomotopy follow standard Python types.
Compiler requirements have been raised to C++14.

3 years ago

A critical bug of asymmetric alphas was fixed. Due to this bug, version 0.11.0 has been removed from releases.

3 years ago

A new topic model tomotopy.PTModel for short texts was added into the package.
An issue was fixed where tomotopy.HDPModel.infer causes a segmentation fault sometimes.
A mismatch of numpy API version was fixed.
Now asymmetric document-topic priors are supported.
Serializing topic models to bytes in memory is supported.
An argument normalize was added to get_topic_dist(), get_topic_word_dist() and get_sub_topic_dist() for controlling normalization of results.
Now tomotopy.DMRModel.lambdas and tomotopy.DMRModel.alpha give correct values.
Categorical metadata supports for tomotopy.GDMRModel were added (see https://github.com/bab2min/tomotopy/blob/main/examples/gdmr_both_categorical_and_numerical.py ).
Python3.5 support was dropped.