Clustering for mixed-type data
evaluate
helper function to do both DBCV and Calinski-Harabaszscore
method to evaluate
; now scores via DBCV, coverage and return lablesuse_gpu
set to False or trueAdd predict
method based on the combine method for ensemble
.
When ensemble is selected, Denseclus does not combine the umaps, instead it fits clusterer for each UMAP.
When predict is called it used approximate_predict
in HDSCAN to then vote on the cluster assignment.
Other changes
jaccard
for later rapids integrationprediction_data=False
for combined UMAPs, True
for ensembleA few minor tweaks to the library primarily to help with maintenance.
__repr__
and __str__
methods so the don't return the whole fitted dataframeAdding feature to auto-impute. Will call simple imputation under the hood for both categorical and numerical features. The user can configure these to non-defaults with keyword arguments.
In addition, updated the HDBSCAN so that parameter search comes first as DenseClus converges to the optimal solution for DBCV. I don't know why.
PS: Really should be semantic version 2 but I am going this route instead.
Description of changes:
** New Feature: Configure underlying Algorithms**
Update: Now Supported for Python 3. 11 (and only Python 3.11)
Other Updates