Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
[BACKWARD INCOMPATIBILITY] [Python and R] Inequality measures are no longer referred to as inequity measures.
[BACKWARD INCOMPATIBILITY] [Python and R]
Some external cluster validity measures were renamed
(as per the major revision of https://doi.org/10.48550/arXiv.2209.02935):
adjusted_asymmetric_accuracy
-> normalized_clustering_accuracy
,
normalized_accuracy
-> normalized_pivoted_accuracy
.
[BACKWARD INCOMPATIBILITY] [Python] compare_partitions2
has been removed,
as compare_partitions
and other partition similarity scores
now support both pairs of label vectors (x, y)
and confusion matrices
(x=C, y=None)
.
[Python and R] New parameter to pair_sets_index
: clipped
.
In normalizing_permutation
and external cluster validity measures,
the input matrices can now be of the type double
.
[BUGFIX] [Python] #80: Fixed adjustment for nmslib_n_neighbors
in small samples.
[BUGFIX] [Python] #82: cluster_validity
submodule not imported.
[BUGFIX] Some external cluster validity measures now handle NaNs better and are slightly less prone to round-off errors.
[Python and R] adjusted_asymmetric_accuracy
now accepts confusion matrices with fewer columns than rows.
Such "missing" columns are now treated as if they were filled with 0s.
[Python and R] pair_sets_index
, and normalized_accuracy
return
the same results for non-symmetric confusion matrices and transposes thereof.
[GENERAL] The cluster validity measures are discussed in more detail at https://clustering-benchmarks.gagolewski.com.
[Python and R] New function:
compare_partitions.adjusted_asymmetric_accuracy
.
[Python and R] Implementations of the so-called internal cluster validity measures discussed in DOI: 10.1016/j.ins.2021.10.004; see our (GitHub-only) CVI package for R. In particular, the generalised Dunn indices are based on the code originally authored by Maciej Bartoszuk. Thanks.
Functions added (to the cluster_validity
module in the Python version):
calinski_harabasz_index
,
dunnowa_index
,
generalised_dunn_index
,
negated_ball_hall_index
,
negated_davies_bouldin_index
,
negated_wcss_index
,
silhouette_index
,
silhouette_w_index
,
wcnn_index
.
[BACKWARD INCOMPATIBILITY] compare_partitions.normalized_confusion_matrix
now solves the maximal assignment problem instead of applying
a primitive partial pivoting.
[Python and R] New function: compare_partitions.normalizing_permutation
[R] New function: normalized_confusion_matrix
.
[Python and R] New parameter to compare_partitions.pair_sets_index
:
simplified
.
[Python] New parameters to plots.plot_scatter
:
axis
, title
, xlabel
, ylabel
, xlim
, ylim
.
[GENERAL] A paper on the genieclust
package has appeared in SoftwareX, see https://doi.org/10.1016/j.softx.2021.100722.
[Python] plot_scatter
now uses a more accessible default palette (from R 4.0.0).
[Python] New function: inequity.devergottini_index
.
[R] New function: devergottini_index
.
Fixed build errors on Solaris: _X
is a reserved identifier.
[Python] Python >= 3.7 is now required (implied by numpy
).
[R] Use RcppMLPACK
directly instead of via a wrapper, emstreeR
.