Amazon Denseclus Versions Save

Clustering for mixed-type data

v0.2.2

2 months ago
  • Updated evaluate helper function to do both DBCV and Calinski-Harabasz
  • Added new Notebook for exploring clustering on SageMaker Jumpstart
  • Dependency version bumps

v0.2.1

5 months ago
  • splitting up modules - numerical and categorical have there own files now for future enhancements
  • changed score method to evaluate ; now scores via DBCV, coverage and return lables
  • set gpu settings consolidated, now just use_gpu set to False or true
  • add version file for automated setup

v0.2.0

5 months ago

Summary

Add predict method based on the combine method for ensemble. When ensemble is selected, Denseclus does not combine the umaps, instead it fits clusterer for each UMAP. When predict is called it used approximate_predict in HDSCAN to then vote on the cluster assignment.

Other changes

  • Change default method from 'contrast' to 'intersection'
  • Change default distance metric for categoricals to jaccard for later rapids integration
  • Increase overall test coverage
  • prediction_data=False for combined UMAPs, True for ensemble
  • Update examples to reflect changes

v.0.1.2

5 months ago

A few minor tweaks to the library primarily to help with maintenance.

  1. Adding Continuous Deployment CD workflow to directly publish to PyPI when merged into main
  2. Fixed __repr__ and __str__ methods so the don't return the whole fitted dataframe
  3. Fixed coverage runs and made tox a single call

v0.1.1

6 months ago

Adding feature to auto-impute. Will call simple imputation under the hood for both categorical and numerical features. The user can configure these to non-defaults with keyword arguments.

In addition, updated the HDBSCAN so that parameter search comes first as DenseClus converges to the optimal solution for DBCV. I don't know why.

PS: Really should be semantic version 2 but I am going this route instead.

https://github.com/awslabs/amazon-denseclus/issues/23

v0.1.0

6 months ago

Description of changes:

** New Feature: Configure underlying Algorithms**

Update: Now Supported for Python 3. 11 (and only Python 3.11)

Other Updates

  • Move to using Ruff for linting
  • Address some bugs and user warnings in the package code
  • Update and lint notebooks
  • Refactor unit tests with fixtures
  • Update tox, precommit, etc to run on latest Python
  • Refactor of Makefile to support all above
  • Better error handling
  • Update workflows in GHA to remove redudancy
  • Better issues tracking templates