Amazon Denseclus Versions Save

Clustering for mixed-type data

v0.2.2

2 months ago

Updated evaluate helper function to do both DBCV and Calinski-Harabasz
Added new Notebook for exploring clustering on SageMaker Jumpstart
Dependency version bumps

v0.2.1

5 months ago

splitting up modules - numerical and categorical have there own files now for future enhancements
changed score method to evaluate ; now scores via DBCV, coverage and return lables
set gpu settings consolidated, now just use_gpu set to False or true
add version file for automated setup

v0.2.0

5 months ago

Summary

Add predict method based on the combine method for ensemble. When ensemble is selected, Denseclus does not combine the umaps, instead it fits clusterer for each UMAP. When predict is called it used approximate_predict in HDSCAN to then vote on the cluster assignment.

Other changes

Change default method from 'contrast' to 'intersection'
Change default distance metric for categoricals to jaccard for later rapids integration
Increase overall test coverage
prediction_data=False for combined UMAPs, True for ensemble
Update examples to reflect changes

v.0.1.2

5 months ago

A few minor tweaks to the library primarily to help with maintenance.

Adding Continuous Deployment CD workflow to directly publish to PyPI when merged into main
Fixed __repr__ and __str__ methods so the don't return the whole fitted dataframe
Fixed coverage runs and made tox a single call

v0.1.1

6 months ago

Adding feature to auto-impute. Will call simple imputation under the hood for both categorical and numerical features. The user can configure these to non-defaults with keyword arguments.

In addition, updated the HDBSCAN so that parameter search comes first as DenseClus converges to the optimal solution for DBCV. I don't know why.

PS: Really should be semantic version 2 but I am going this route instead.

https://github.com/awslabs/amazon-denseclus/issues/23

v0.1.0

6 months ago

Description of changes:

** New Feature: Configure underlying Algorithms**

Update: Now Supported for Python 3. 11 (and only Python 3.11)

Other Updates

Move to using Ruff for linting
Address some bugs and user warnings in the package code
Update and lint notebooks
Refactor unit tests with fixtures
Update tox, precommit, etc to run on latest Python
Refactor of Makefile to support all above
Better error handling
Update workflows in GHA to remove redudancy
Better issues tracking templates