Imbalanced Learn Versions Save

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

0.7.0

3 years ago

A release to bump the minimum version of scikit-learn to 0.23 with a couple of bug fixes. Check the what's new for more information.

0.6.2

4 years ago

This is a bug-fix release to resolve some issues regarding the handling the input and the output format of the arrays.

Changelog

  • Allow column vectors to be passed as targets. #673 by @chkoar.
  • Better input/output handling for pandas, numpy and plain lists. #681 by @chkoar.

0.6.1

4 years ago

This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.

Changelog

Bug fixes

  • Fix a bug in :class:imblearn.ensemble.BalancedRandomForestClassifier leading to a wrong number of samples used during fitting due max_samples and therefore a bad computation of the OOB score. :pr:656 by :user:Guillaume Lemaitre <glemaitre>.

0.6.0

4 years ago

Changelog

Changed models ..............

The following models might give some different sampling due to changes in scikit-learn:

  • :class:imblearn.under_sampling.ClusterCentroids
  • :class:imblearn.under_sampling.InstanceHardnessThreshold

The following samplers will give different results due to change linked to the random state internal usage:

  • :class:imblearn.over_sampling.SMOTENC

Bug fixes .........

  • :class:imblearn.under_sampling.InstanceHardnessThreshold now take into account the random_state and will give deterministic results. In addition, cross_val_predict is used to take advantage of the parallelism. :pr:599 by :user:Shihab Shahriar Khan <Shihab-Shahriar>.

  • Fix a bug in :class:imblearn.ensemble.BalancedRandomForestClassifier leading to a wrong computation of the OOB score. :pr:656 by :user:Guillaume Lemaitre <glemaitre>.

Maintenance ...........

  • Update imports from scikit-learn after that some modules have been privatize. The following import have been changed: :class:sklearn.ensemble._base._set_random_states, :class:sklearn.ensemble._forest._parallel_build_trees, :class:sklearn.metrics._classification._check_targets, :class:sklearn.metrics._classification._prf_divide, :class:sklearn.utils.Bunch, :class:sklearn.utils._safe_indexing, :class:sklearn.utils._testing.assert_allclose, :class:sklearn.utils._testing.assert_array_equal, :class:sklearn.utils._testing.SkipTest. :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • Synchronize :mod:imblearn.pipeline with :mod:sklearn.pipeline. :pr:620 by :user:Guillaume Lemaitre <glemaitre>.

  • Synchronize :class:imblearn.ensemble.BalancedRandomForestClassifier and add parameters max_samples and ccp_alpha. :pr:621 by :user:Guillaume Lemaitre <glemaitre>.

Enhancement ...........

  • :class:imblearn.under_sampling.RandomUnderSampling, :class:imblearn.over_sampling.RandomOverSampling, :class:imblearn.datasets.make_imbalance accepts Pandas DataFrame in and will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and will output Pandas Series. :pr:636 by :user:Guillaume Lemaitre <glemaitre>.

  • :class:imblearn.FunctionSampler accepts a parameter validate allowing to check or not the input X and y. :pr:637 by :user:Guillaume Lemaitre <glemaitre>.

  • :class:imblearn.under_sampling.RandomUnderSampler, :class:imblearn.over_sampling.RandomOverSampler can resample when non finite values are present in X. :pr:643 by :user:Guillaume Lemaitre <glemaitre>.

  • All samplers will output a Pandas DataFrame if a Pandas DataFrame was given as an input. :pr:644 by :user:Guillaume Lemaitre <glemaitre>.

  • The samples generation in :class:imblearn.over_sampling.SMOTE, :class:imblearn.over_sampling.BorderlineSMOTE, :class:imblearn.over_sampling.SVMSMOTE, :class:imblearn.over_sampling.KMeansSMOTE, :class:imblearn.over_sampling.SMOTENC is now vectorize with giving an additional speed-up when X in sparse. :pr:596 by :user:Matt Eding <MattEding>.

Deprecation ...........

  • The following classes have been removed after 2 deprecation cycles: ensemble.BalanceCascade and ensemble.EasyEnsemble. :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • The following functions have been removed after 2 deprecation cycles: utils.check_ratio. :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • The parameter ratio and return_indices has been removed from all samplers. :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

  • The parameters m_neighbors, out_step, kind, svm_estimator have been removed from the :class:imblearn.over_sampling.SMOTE. :pr:617 by :user:Guillaume Lemaitre <glemaitre>.

0.5.0

4 years ago

Version 0.5.0

Changed models

The following models or function might give different results even if the same data X and y are the same.

  • :class:imblearn.ensemble.RUSBoostClassifier default estimator changed from :class:sklearn.tree.DecisionTreeClassifier with full depth to a decision stump (i.e., tree with max_depth=1).

Documentation

  • Correct the definition of the ratio when using a float in sampling strategy for the over-sampling and under-sampling. :issue:525 by :user:Ariel Rossanigo <arielrossanigo>.

  • Add :class:imblearn.over_sampling.BorderlineSMOTE and :class:imblearn.over_sampling.SVMSMOTE in the API documenation. :issue:530 by :user:Guillaume Lemaitre <glemaitre>.

Enhancement

  • Add Parallelisation for SMOTEENN and SMOTETomek. :pr:547 by :user:Michael Hsieh <Microsheep>.

  • Add :class:imblearn.utils._show_versions. Updated the contribution guide and issue template showing how to print system and dependency information from the command line. :pr:557 by :user:Alexander L. Hayes <batflyer>.

  • Add :class:imblearn.over_sampling.KMeansSMOTE which is an over-sampler clustering points before to apply SMOTE. :pr:435 by :user:Stephan Heijl <StephanHeijl>.

Maintenance

  • Make it possible to import imblearn and access submodule. :pr:500 by :user:Guillaume Lemaitre <glemaitre>.

  • Remove support for Python 2, remove deprecation warning from scikit-learn 0.21. :pr:576 by :user:Guillaume Lemaitre <glemaitre>.

Bug

  • Fix wrong usage of :class:keras.layers.BatchNormalization in porto_seguro_keras_under_sampling.py example. The batch normalization was moved before the activation function and the bias was removed from the dense layer. :pr:531 by :user:Guillaume Lemaitre <glemaitre>.

  • Fix bug which converting to COO format sparse when stacking the matrices in :class:imblearn.over_sampling.SMOTENC. This bug was only old scipy version. :pr:539 by :user:Guillaume Lemaitre <glemaitre>.

  • Fix bug in :class:imblearn.pipeline.Pipeline where None could be the final estimator. :pr:554 by :user:Oliver Rausch <orausch>.

  • Fix bug in :class:imblearn.over_sampling.SVMSMOTE and :class:imblearn.over_sampling.BorderlineSMOTE where the default parameter of n_neighbors was not set properly. :pr:578 by :user:Guillaume Lemaitre <glemaitre>.

  • Fix bug by changing the default depth in :class:imblearn.ensemble.RUSBoostClassifier to get a decision stump as a weak learner as in the original paper. :pr:545 by :user:Christos Aridas <chkoar>.

  • Allow to import keras directly from tensorflow in the :mod:imblearn.keras. :pr:531 by :user:Guillaume Lemaitre <glemaitre>.

0.4.3

5 years ago

Mainly bugfix in SMOTE NC

0.4.2

5 years ago

Version 0.4.2

Bug fixes

  • Fix a bug in imblearn.over_sampling.SMOTENC in which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491.
  • Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.

0.4.1

5 years ago

Version 0.4

October, 2018

Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.

Highlights

This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.

As new feature, 2 new modules imblearn.keras and imblearn.tensorflow have been added in which imbalanced-learn samplers can be used to generate balanced mini-batches.

The module imblearn.ensemble has been consolidated with new classifier: imblearn.ensemble.BalancedRandomForestClassifier, imblearn.ensemble.EasyEnsembleClassifier, imblearn.ensemble.RUSBoostClassifier.

Support for string has been added in imblearn.over_sampling.RandomOverSampler and imblearn.under_sampling.RandomUnderSampler. In addition, a new class imblearn.over_sampling.SMOTENC allows to generate sample with data sets containing both continuous and categorical features.

The imblearn.over_sampling.SMOTE has been simplified and break down to 2 additional classes: imblearn.over_sampling.SVMSMOTE and imblearn.over_sampling.BorderlineSMOTE.

There is also some changes regarding the API: the parameter sampling_strategy has been introduced to replace the ratio parameter. In addition, the return_indices argument has been deprecated and all samplers will exposed a sample_indices_ whenever this is possible.

0.4.0

5 years ago

Version 0.4

October, 2018

.. warning::

Version 0.4 is the last version of imbalanced-learn to support Python 2.7
and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.

Highlights

This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.

As new feature, 2 new modules imblearn.keras and imblearn.tensorflow have been added in which imbalanced-learn samplers can be used to generate balanced mini-batches.

The module imblearn.ensemble has been consolidated with new classifier: imblearn.ensemble.BalancedRandomForestClassifier, imblearn.ensemble.EasyEnsembleClassifier, imblearn.ensemble.RUSBoostClassifier.

Support for string has been added in imblearn.over_sampling.RandomOverSampler and imblearn.under_sampling.RandomUnderSampler. In addition, a new class imblearn.over_sampling.SMOTENC allows to generate sample with data sets containing both continuous and categorical features.

The imblearn.over_sampling.SMOTE has been simplified and break down to 2 additional classes: imblearn.over_sampling.SVMSMOTE and imblearn.over_sampling.BorderlineSMOTE.

There is also some changes regarding the API: the parameter sampling_strategy has been introduced to replace the ratio parameter. In addition, the return_indices argument has been deprecated and all samplers will exposed a sample_indices_ whenever this is possible.

0.3.4

5 years ago

Just for switching documentation