A library of extension and helper modules for Python's data analysis and machine learning libraries.
[Source code (zip)](https://github.com/rasbt/mlxtend/archive/v0.21.1.zip)
[Source code (tar.gz)](https://github.com/rasbt/mlxtend/archive/v0.22.1.tar.gz)
LinearRegression
model of sklearn in the test removing the normalize
parameter as it is deprecated. ([#1036](https://github.com/rasbt/mlxtend/issues/1036))pyproject.toml
to support PEP 518 builds ([#1065](https://github.com/rasbt/mlxtend/issues/1065) via [jmahlik](https://github.com/jmahlik))pyproject.toml
([#1065](https://github.com/rasbt/mlxtend/issues/1065) via [jmahlik](https://github.com/jmahlik))mlxtend.image
submodule with face recognition functions due to poor dlib
support in modern environments.SequentialFeatureSelector
and multiclass ROC AUC.ExhaustiveFeatureSelector
is run with n_jobs == 1
, joblib is now disabled, which enables more immediate (live) feedback when the verbose
mode is enabled. (#985 via Nima Sarajpoor)EnsembleVoteClassifier
(#941)mlxtend.frequent_patterns.association_rules
function has a new metric - Zhang's Metric, which measures both association and dissociation. (#980)mlxtend.frequent_patterns.fpmax
code improvement that avoids casting a sparse DataFrame into a dense NumPy array. (#1000 via Tim Kellogg)plot_decision_regions
function now has a n_jobs
parameter to parallelize the computation. (In a particular use case, on a small dataset, there was a 21x speed-up (449 seconds vs 21 seconds on local HPC instance of 36 cores). (#998 via Khalid ElHaj)mlxtend.frequent_patterns.hmine
algorithm and documentation for mining frequent itemsets using the H-Mine algorithm. (#1020 via Fatih Sen)mlxtend.evaluate.feature_importance_permutation
function has a new feature_groups
argument to treat user-specified feature groups as single features, which is useful for one-hot encoded features. (#955)mlxtend.feature_selection.ExhaustiveFeatureSelector
and SequentialFeatureSelector
also gained support for feature_groups
with a behavior similar to the one described above. (#957 and #965 via Nima Sarajpoor)custom_feature_names
parameter was removed from the ExhaustiveFeatureSelector
due to redundancy and to simplify the code base. The ExhaustiveFeatureSelector
documentation illustrates how the same behavior and outcome can be achieved using pandas DataFrames. (#957)mlxtend.evaluate.bootstrap_point632_score
now supports fit_params
. (#861)mlxtend/plotting/decision_regions.py
function now has a contourf_kwargs
for matplotlib to change the look of the decision boundaries if desired. (#881 via [pbloem])norm_colormap
parameter to mlxtend.plotting.plot_confusion_matrix
, to allow normalizing the colormap, e.g., using matplotlib.colors.LogNorm()
(#895)GroupTimeSeriesSplit
class for evaluation in time series tasks with support of custom groups and additional parameters in comparison with scikit-learn's TimeSeriesSplit
. (#915 via Dmitry Labazkin)mlxtend.plotting.heatmap
and mlxtend.plotting.plot_confusion_matrix
(#872)apriori
, fpmax
, and fpgrowth
. (#934 via NimaSarajpoor)evaluate.accuracy_score
in addition to the existing "average" option to compute the scikit-learn-style balanced accuracy. (#764)scatter_hist
function to mlxtend.plotting
for generating a scattered histogram. (#757 via Maitreyee Mhasaka)evaluate.permutation_test
function now accepts a paired
argument to specify to support paired permutation/randomization tests. (#768)StackingCVRegressor
now also supports multi-dimensional targets similar to StackingRegressor
via StackingCVRegressor(..., multi_output=True)
. (#802 via Marco Tiraboschi)StackingRegressor
now requires setting StackingRegressor(..., multi_output=True)
if the target is multi-dimensional; this allows for better input validation. (#802)res
argument from plot_decision_regions
. (#803)title_fontsize
parameter to plot_learning_curves
for controlling the title font size; also the plot style is now the matplotlib default. (#818)'c': 'none'
instead of 'c': ''
in mlxtend.plotting.plot_decision_regions
's scatterplot highlights to stay compatible with Matplotlib 3.4 and newer. (#822)fontcolor_threshold
parameter to the mlxtend.plotting.plot_confusion_matrix
function as an additional option for determining the font color cut-off manually. (#827)frequent_patterns.association_rules
now raises a ValueError
if an empty frequent itemset DataFrame is passed. (#843)mlxtend.evaluate.bootstrap_point632_score
function now use the whole training set for the resubstitution weighting term instead of the internal training set that is a new bootstrap sample in each round. (#844)bias_variance_decomp
function now supports optional fit_params
for the estimators that are fit on bootstrap samples. (#748)bias_variance_decomp
function now supports Keras estimators. (#725 via @hanzigs)mlxtend.classifier.OneRClassifier
(One Rule Classfier) class, a simple rule-based classifier that is often used as a performance baseline or simple interpretable model. (#726
create_counterfactual
method for creating counterfactuals to explain model predictions. (#740)permutation_test
(mlxtend.evaluate.permutation
) ìs corrected to give the proportion of permutations whose statistic is at least as extreme as the one observed. (#721 via Florian Charlier)LogisticRegression
for logging purposes didn't include the L2 penalty for the first weight in the weight vector (this is not the bias unit). However, since this loss function was only used for logging purposes, and the gradient remains correct, this does not have an effect on the main code. (#741)bias_variance_decomp
where when the mse
loss was used, downcasting to integers caused imprecise results for small numbers. (#749)predict_proba
kwarg to bootstrap methods, to allow bootstrapping of scoring functions that take in probability values. (#700 via Adam Li)cell_values
parameter to mlxtend.plotting.heatmap()
to optionally suppress cell annotations by setting cell_values=False
. (#703
use_clones
and fit_base_estimators
(previously refit
in EnsembleVoteClassifier
) for EnsembleVoteClassifier
and StackingClassifier
. (#670 via Katrina Ni)mlxtend.text
to prevent deprecation warning in Python 3.8 (#688)meshgrid
in no_information_rate
function used by the bootstrap_point632_score
function for the .632+ estimate. (#688)fpmax
that could lead to incorrect support values. (#692 via Steve Harenberg)OnehotTransactions
has been removed in favor of the TransactionEncoder.
SparseDataFrame
support in frequent pattern mining functions in favor of pandas >=1.0's new way for working sparse data. If you used SparseDataFrame
formats, please see pandas' migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating (#667)SequentialFeatureSelector
now supports using pre-specified feature sets via the fixed_features
parameter. (#578)accuracy_score
function to mlxtend.evaluate
for computing basic classifcation accuracy, per-class accuracy, and average per-class accuracy. (#624 via Deepan Das)StackingClassifier
and StackingCVClassifier
now have a decision_function
method, which serves as a preferred choice over predict_proba
in calculating roc_auc and average_precision scores when the meta estimator is a linear model or support vector classifier. (#634 via Qiang Gu)apriori
frequent itemset generating function when low_memory=True
. Setting low_memory=False
(default) is still faster for small itemsets, but low_memory=True
can be much faster for large itemsets and requires less memory. Also, input validation for apriori
, ̀ fpgrowthand
fpmaxtakes a significant amount of time when input pandas DataFrame is large; this is now dramatically reduced when input contains boolean values (and not zeros/ones), which is the case when using
TransactionEncoder`. (#619 via Denis Barbier)apriori
, ̀ fpgrowthand
fpmax` runs much faster on sparse DataFrame when input pandas DataFrame contains integer values. (#621 via Denis Barbier)fpgrowth
and fpmax
directly work on sparse DataFrame, they were previously converted into dense Numpy arrays. (#622 via Denis Barbier)mlxtend.plotting.plot_pca_correlation_graph
that caused the explaind variances not summing up to 1. Also, improves the runtime performance of the correlation computation and adds a missing function argument for the explained variances (eigenvalues) if users provide their own principal components. (#593 via Gabriel Azevedo Ferreira)fpgrowth
and apriori
consistent for edgecases such as min_support=0
. (#573 via Steve Harenberg)fpmax
returns an empty data frame now instead of raising an error if the frequent itemset set is empty. (#573 via Steve Harenberg)mlxtend.plotting.plot_confusion_matrix
, where the font-color choice for medium-dark cells was not ideal and hard to read. #588 via sohrabtowfighi)svd
mode of mlxtend.feature_extraction.PrincipalComponentAnalysis
now also n-1 degrees of freedom instead of n d.o.f. when computing the eigenvalues to match the behavior of eigen
. #595
StackingCVClassifier
because it causes issues if pipelines are used as input. #606