ML hyperparameters tuning and features selection, using evolutionary algorithms.
This release brings support to python 3.10, it also comes with different API updates and algorithms optimization
GAFeatureSelectionCV
now mimics the scikit-learn FeatureSelection algorithms API instead of Grid Search, this enables easier implementation as a selection method that is closer to the scikit-learn APIGAFeatureSelectionCV
candidate generation when max_features
is set, it also ensures there is at least one feature selectedcrossover_probability
and mutation_probability
are now correctly passed to the mate and mutation functions inside GAFeatureSelectionCVThanks to the people who contributed with their ideas and suggestions
This release comes with new features and general performance improvements
Introducing Adaptive Schedulers to enable adaptive mutation and crossover probabilities; currently, supported schedulers are: ConstantAdapter
, ExponentialAdapter
, InverseAdapter
, and PotentialAdapter
Add random_state parameter (default= None) in Continuous
, Categorical
and Integer
classes from space to leave fixed the random seed during hyperparameters sampling.
Changed the default values of mutation_probability and crossover_probability to 0.8 and 0.2, respectively.
The weighted_choice function used in GAFeatureSelectionCV
was re-written to give more probability to a number of features closer to the max_features parameter
Removed unused and broken function plot_parallel_coordinates()
This release implements a change when the max_features parameter from class GAFeatureSelectionCV is set, the initial population is now sampled giving more probability to solutions with less than max_features features.
This release comes with some requested features and enhancements.
Class GAFeatureSelectionCV
now has a parameter called max_features
, int, default=None. If it's not None, it will penalize individuals with more features than max_features, putting a "soft" upper bound to the number of features to be selected.
Classes GASearchCV
and GAFeatureSelectionCV
now support multi-metric evaluation the same way scikit-learn does; you will see this reflected on the logbook
and cv_results_
objects, where now you get results for each metric. As in scikit-learn, if multi-metric is used, the refit
parameter must be a str specifying the metric to evaluate the cv-scores.
Training gracefully stops if interrupted by some of these exceptions: KeyboardInterrupt
, SystemExit
, StopIteration
.
When one of these exceptions is raised, the model finishes the current generation and saves the current best model. It only works if at least one generation has been completed.
The following parameters changed their default values to create more extensive and different models with better results:
population_size from 10 to 50
generations from 40 to 80
mutation_probability from 0.1 to 0.2
This is an exciting release! It introduces features selection capabilities to the package
GAFeatureSelectionCV
class for feature selection along with any scikit-learn classifier or regressor. It optimizes the cv-score while minimizing the number of features to select. This class is compatible with the mlflow and tensorboard integration, the Callbacks, and the plot_fitness_evolution function.The module mlflow was renamed to mlflow_log to avoid unexpected errors on name resolutions
This is a minor release that fixes a couple of bugs and adds some minor options.
generations
to DeltaThreshold
. Now it compares the maximum and minimum values of a metric from the last generations, instead of just the current and previous ones. The default value is 2, so the behavior remains the same as in previous versions.tools.cxSimulatedBinaryBounded
.Continuous
class with boundaries lower
and upper
, a uniform distribution with limits [lower, lower + upper]
was sampled, now, it's properly sampled using a [lower, upper]
limit.This is a big release with several new features and enhancements! 🎊
Added the ProgressBar
callback, it uses tqdm progress bar to shows how many generations are left in the training progress.
Added the TensorBoard
callback to log the generation metrics, watch in real-time while the models are trained, and compare different runs in your TensorBoard instance.
Added the TimerStopping
callback to stop the iterations after a total (threshold) fitting time has been elapsed.
Added new parallel coordinates plot using plot_parallel_coordinates
by @Raul9595
Now if one or more callbacks decides to stop the algorithm, it will print its class name to know which callbacks were responsible of the stopping.
Added support for extra methods coming from scikit-learn's BaseSearchCV, like cv_results_
,
best_index_
and refit_time_
among others.
Added methods on_start
and on_end
to BaseCallback
. Now the algorithms check for the callbacks like this:
on_start: When the evolutionary algorithm is called from the GASearchCV.fit method.
on_step: When the evolutionary algorithm finishes a generation (no change here).
on_end: At the end of the last generation.
sklearn_genetic.plots
and sklearn_genetic.mlflow.MLflowConfig
now requires an explicit installation of seaborn and mlflow, now those are optionally installed using pip install sklearn-genetic-opt[all].
return_train_score
: bool, default=False. As in scikit-learn, it controls if the cv_results_
should have the training scores.self.best_params_
for the position 0, to be consistent with the
scikit-learn API and parameters like self.best_index_
Thanks to new contributors for helping in this project! @Raul9595 @Turtle24
Build-in integration with MLflow using the class sklearn_genetic.mlflow.MLflowConfig
and the new parameter log_config
from the class sklearn_genetic.GASearchCV
Implemented the callback sklearn_genetic.callbacks.LogbookSaver
which saves the estimator.logbook object with all the fitted hyperparameters and their cross-validation score
Added the parameter estimator
to all the functions on the module sklearn_genetic.callbacks
sklearn_genetic.callbacks.base.BaseCallback
from which all Callbacks must inherit from