A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
A release to bump the minimum version of scikit-learn to 0.23 with a couple of bug fixes. Check the what's new for more information.
This is a bug-fix release to primarily resolve some packaging issues in version 0.6.0. It also includes minor documentation improvements and some bug fixes.
imblearn.ensemble.BalancedRandomForestClassifier
leading to a wrong number of samples used during fitting due max_samples and therefore a bad computation of the OOB score. :pr:656
by :user:Guillaume Lemaitre <glemaitre>
.Changed models ..............
The following models might give some different sampling due to changes in scikit-learn:
imblearn.under_sampling.ClusterCentroids
imblearn.under_sampling.InstanceHardnessThreshold
The following samplers will give different results due to change linked to the random state internal usage:
imblearn.over_sampling.SMOTENC
Bug fixes .........
:class:imblearn.under_sampling.InstanceHardnessThreshold
now take into
account the random_state
and will give deterministic results. In addition,
cross_val_predict
is used to take advantage of the parallelism.
:pr:599
by :user:Shihab Shahriar Khan <Shihab-Shahriar>
.
Fix a bug in :class:imblearn.ensemble.BalancedRandomForestClassifier
leading to a wrong computation of the OOB score.
:pr:656
by :user:Guillaume Lemaitre <glemaitre>
.
Maintenance ...........
Update imports from scikit-learn after that some modules have been privatize.
The following import have been changed:
:class:sklearn.ensemble._base._set_random_states
,
:class:sklearn.ensemble._forest._parallel_build_trees
,
:class:sklearn.metrics._classification._check_targets
,
:class:sklearn.metrics._classification._prf_divide
,
:class:sklearn.utils.Bunch
,
:class:sklearn.utils._safe_indexing
,
:class:sklearn.utils._testing.assert_allclose
,
:class:sklearn.utils._testing.assert_array_equal
,
:class:sklearn.utils._testing.SkipTest
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
Synchronize :mod:imblearn.pipeline
with :mod:sklearn.pipeline
.
:pr:620
by :user:Guillaume Lemaitre <glemaitre>
.
Synchronize :class:imblearn.ensemble.BalancedRandomForestClassifier
and add
parameters max_samples
and ccp_alpha
.
:pr:621
by :user:Guillaume Lemaitre <glemaitre>
.
Enhancement ...........
:class:imblearn.under_sampling.RandomUnderSampling
,
:class:imblearn.over_sampling.RandomOverSampling
,
:class:imblearn.datasets.make_imbalance
accepts Pandas DataFrame in and
will output Pandas DataFrame. Similarly, it will accepts Pandas Series in and
will output Pandas Series.
:pr:636
by :user:Guillaume Lemaitre <glemaitre>
.
:class:imblearn.FunctionSampler
accepts a parameter validate
allowing
to check or not the input X
and y
.
:pr:637
by :user:Guillaume Lemaitre <glemaitre>
.
:class:imblearn.under_sampling.RandomUnderSampler
,
:class:imblearn.over_sampling.RandomOverSampler
can resample when non
finite values are present in X
.
:pr:643
by :user:Guillaume Lemaitre <glemaitre>
.
All samplers will output a Pandas DataFrame if a Pandas DataFrame was given
as an input.
:pr:644
by :user:Guillaume Lemaitre <glemaitre>
.
The samples generation in
:class:imblearn.over_sampling.SMOTE
,
:class:imblearn.over_sampling.BorderlineSMOTE
,
:class:imblearn.over_sampling.SVMSMOTE
,
:class:imblearn.over_sampling.KMeansSMOTE
,
:class:imblearn.over_sampling.SMOTENC
is now vectorize with giving
an additional speed-up when X
in sparse.
:pr:596
by :user:Matt Eding <MattEding>
.
Deprecation ...........
The following classes have been removed after 2 deprecation cycles:
ensemble.BalanceCascade
and ensemble.EasyEnsemble
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The following functions have been removed after 2 deprecation cycles:
utils.check_ratio
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The parameter ratio
and return_indices
has been removed from all
samplers.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The parameters m_neighbors
, out_step
, kind
, svm_estimator
have been removed from the :class:imblearn.over_sampling.SMOTE
.
:pr:617
by :user:Guillaume Lemaitre <glemaitre>
.
The following models or function might give different results even if the
same data X
and y
are the same.
imblearn.ensemble.RUSBoostClassifier
default estimator changed from
:class:sklearn.tree.DecisionTreeClassifier
with full depth to a decision
stump (i.e., tree with max_depth=1
).Correct the definition of the ratio when using a float
in sampling
strategy for the over-sampling and under-sampling.
:issue:525
by :user:Ariel Rossanigo <arielrossanigo>
.
Add :class:imblearn.over_sampling.BorderlineSMOTE
and
:class:imblearn.over_sampling.SVMSMOTE
in the API documenation.
:issue:530
by :user:Guillaume Lemaitre <glemaitre>
.
Add Parallelisation for SMOTEENN and SMOTETomek.
:pr:547
by :user:Michael Hsieh <Microsheep>
.
Add :class:imblearn.utils._show_versions
. Updated the contribution guide
and issue template showing how to print system and dependency information
from the command line. :pr:557
by :user:Alexander L. Hayes <batflyer>
.
Add :class:imblearn.over_sampling.KMeansSMOTE
which is an over-sampler
clustering points before to apply SMOTE.
:pr:435
by :user:Stephan Heijl <StephanHeijl>
.
Make it possible to import imblearn
and access submodule.
:pr:500
by :user:Guillaume Lemaitre <glemaitre>
.
Remove support for Python 2, remove deprecation warning from
scikit-learn 0.21.
:pr:576
by :user:Guillaume Lemaitre <glemaitre>
.
Fix wrong usage of :class:keras.layers.BatchNormalization
in
porto_seguro_keras_under_sampling.py
example. The batch normalization
was moved before the activation function and the bias was removed from the
dense layer.
:pr:531
by :user:Guillaume Lemaitre <glemaitre>
.
Fix bug which converting to COO format sparse when stacking the matrices in
:class:imblearn.over_sampling.SMOTENC
. This bug was only old scipy version.
:pr:539
by :user:Guillaume Lemaitre <glemaitre>
.
Fix bug in :class:imblearn.pipeline.Pipeline
where None could be the final
estimator.
:pr:554
by :user:Oliver Rausch <orausch>
.
Fix bug in :class:imblearn.over_sampling.SVMSMOTE
and
:class:imblearn.over_sampling.BorderlineSMOTE
where the default parameter
of n_neighbors
was not set properly.
:pr:578
by :user:Guillaume Lemaitre <glemaitre>
.
Fix bug by changing the default depth in
:class:imblearn.ensemble.RUSBoostClassifier
to get a decision stump as a
weak learner as in the original paper.
:pr:545
by :user:Christos Aridas <chkoar>
.
Allow to import keras
directly from tensorflow
in the
:mod:imblearn.keras
.
:pr:531
by :user:Guillaume Lemaitre <glemaitre>
.
Mainly bugfix in SMOTE NC
Version 0.4.2
Bug fixes
October, 2018
Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.
October, 2018
.. warning::
Version 0.4 is the last version of imbalanced-learn to support Python 2.7
and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.
Just for switching documentation