Kmodes Versions Save

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data

1 year ago

What's changed

Improve estimation of gamma for k-prototypes (https://github.com/nicodv/kmodes/pull/186)

Full Changelog: https://github.com/nicodv/kmodes/compare/0.12.1...0.12.2

2 years ago

Fix for broken fit_predict on KPrototypes (https://github.com/nicodv/kmodes/pull/176)
Improved validation of sample weights (https://github.com/nicodv/kmodes/pull/176)

Full Changelog: https://github.com/nicodv/kmodes/compare/0.12.0...0.12.1

2 years ago

Support for sample weights for both k-modes and k-prototypes algorithms, courtesy of @kklein (https://github.com/nicodv/kmodes/pull/174, https://github.com/nicodv/kmodes/pull/171)
Add official support for Python 3.10 (https://github.com/nicodv/kmodes/pull/170)
Bugfix for algorithm convergence (https://github.com/nicodv/kmodes/commit/370d64b1067331b413d641103a52bd4c636ac702)
Switch internally to pytest from nose (https://github.com/nicodv/kmodes/pull/170)
Some small fixes and cleanups

Full Changelog: https://github.com/nicodv/kmodes/compare/0.11.1...0.12.0

2 years ago

155: Make _labels_cost function public by @nicodv in https://github.com/nicodv/kmodes/pull/156
Iterations were running for 1 more than expected by @nicodv in https://github.com/nicodv/kmodes/pull/160
Change feature array initialization dtype to uint32 by @rggelles in https://github.com/nicodv/kmodes/pull/166. This reduces memory footprint significantly.
Drop support for missing values, following scikit-learn: https://github.com/nicodv/kmodes/commit/a20f6ed6567f4c0d5c5c9ad70ca86a6b77ab522f

Full Changelog: https://github.com/nicodv/kmodes/compare/0.11.0...0.11.1

3 years ago

Python 3.9 support
Minimum sklearn version upgrade to 0.22
Default init method for k-prototypes is now the Cao method (same as k-modes and in line with documentation), courtesy of @larroy
Optimizations

4 years ago

4 years ago

4 years ago

Categorical variables are now automatically encoded and decoded between original data values and integers (used internally by k-modes). User does not have to use to the categorical variable mapping anymore when looking at the cluster centroids.
Support for custom dissimilarity measures
Python 3.6 support
More robust manual initialization

4 years ago

Huge speedup for k-prototypes, especially for large numbers of samples (#45). A k-prototypes benchmark script is included in examples now.
Offer an implementation of Ng's dissimilarity measure, which could improve convergence (#37).
Allow pandas DataFrames to be presented to the algorithm, instead of just numpy arrays (#40).
Improved handling of dependencies (#49, #53).
Various small bugfixes and improvements.

4 years ago

Support for more than 256 clusters
Optional parallel execution of the multiple initialization runs (courtesy of @rphes )
Enhanced error checking when using pandas DataFrames as inputs to the algorithms
Various bug fixes and improvements
Semantic versioning from now on