Python package of Tomoto, the Topic Modeling Tool
tomotopy.viewer.open_viewer()
tomotopy.utils.Corpus.process()
Document.span
now returns the ranges in character unit, not in byte unit.tomotopy.LDAModel.train
and tomotopy.LDAModel.set_word_prior
.LDAModel.train
now has new arguments callback
, callback_interval
and show_progres
to monitor the training progress.LDAModel.set_word_prior
now can accept Dict[int, float]
type as its argument prior
.tomotopy.Document.get_sub_topic_dist()
raises a bad argument exception.tomotopy.LDAModel.add_doc()
just ignores it instead of raising an exception. If the newly added argument ignore_empty_words
is set to False, an exception is raised as before. (#161)tomotopy.HDPModel.purge_dead_topics()
method is added to remove non-live topics from the model. (#152)tomotopy.SLDAModel
(by @jucendrero). (#174)tomotopy.utils.Coherence
did not work for tomotopy.DTModel
. (#164)make_doc()
before calling train()
. (#166)tomotopy.DMRModel
and tomotopy.GDMRModel
are different even when the seed is fixed. (#63)tomotopy.DMRModel
and tomotopy.GDMRModel
has been improved.tomotopy.PTModel.copy()
.convert_to_lda
of tomotopy.HDPModel
with min_cf > 0
, min_df > 0
or rm_top > 0
causes a crash has been fixed.from_pseudo_doc
is added to tomotopy.Document.get_topics
and tomotopy.Document.get_topic_dist
.
This argument is only valid for documents of PTModel
, it enables to control a source for computing topic distribution.p
of tomotopy.PTModel
has been changed. The new default value is k * 10
.make_doc
without calling infer
doesn't cause a crash anymore, but just print warning messages.tomotopy.LDAModel.set_word_prior()
causes a crash has been fixed.tomotopy.LDAModel.perplexity
and tomotopy.LDAModel.ll_per_word
return the accurate value when TermWeight
is not ONE
.tomotopy.LDAModel.used_vocab_weighted_freq
was added, which returns term-weighted frequencies of words.tomotopy.LDAModel.summary()
shows not only the entropy of words, but also the entropy of term-weighted words.tomotopy.DMRModel
and tomotopy.GDMRModel
support multiple values of metadata (see https://github.com/bab2min/tomotopy/blob/main/examples/dmr_multi_label.py )tomotopy.GDMRModel
was improved.copy()
method has been added for all topic models to do a deep copy.min_cf
, min_df
) have incorrect topic id. Now all excluded words have -1
as topic id.tomotopy
follow standard Python types.tomotopy.PTModel
for short texts was added into the package.tomotopy.HDPModel.infer
causes a segmentation fault sometimes.bytes
in memory is supported.normalize
was added to get_topic_dist()
, get_topic_word_dist()
and get_sub_topic_dist()
for controlling normalization of results.tomotopy.DMRModel.lambdas
and tomotopy.DMRModel.alpha
give correct values.tomotopy.GDMRModel
were added (see https://github.com/bab2min/tomotopy/blob/main/examples/gdmr_both_categorical_and_numerical.py ).