A library of extension and helper modules for Python's data analysis and machine learning libraries.
iris_data()
such that both the UCI Repository version of the Iris dataset as well as the corrected, original
version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad)groups
parameter to SequentialFeatureSelector
and ExhaustiveFeatureSelector
fit()
methods for forwarding to sklearn CV (#537 via arc12)plot_pca_correlation_graph
function to the mlxtend.plotting
submodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira)zoom_factor
parameter to the mlxten.plotting.plot_decision_region
function that allows users to zoom in and out of the decision region plots. (#545)fpgrowth
that implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existing apriori
algorithm. (#550 via Steve Harenberg)heatmap
function in mlxtend.plotting
. (#552)fpmax
that implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for the fpgrowth
algorithm. (#553 via Steve Harenberg)figsize
parameter for the plot_decision_regions
function in mlxtend.plotting
. (#555 via Mirza Hasanbasic)low_memory
option for the apriori
frequent itemset generating function. Setting low_memory=False
(default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True
). (#567 via jmayse)sklearn.externals.joblib
. (#547)StackingCVClassifier
and StackingCVRegressor
such that first-level models are allowed to generate output of non-numeric type. (#562)iris_data()
under iris.py
by adding a note about differences in the iris data in R and UCI machine learning repo.'svd'
mode is used in PCA, the number of eigenvalues is the same as when using 'eigen'
(append 0's zeros in that case) (#565)StackingCVClassifier
and StackingCVRegressor
now support random_state
parameter, which, together with shuffle
, controls the randomness in the cv splitting. (#523 via Qiang Gu)StackingCVClassifier
and StackingCVRegressor
now have a new drop_last_proba
parameter. It drops the last "probability" column in the feature set since if True
,
because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)StackingClassifier
, StackingCVClassifier
and StackingRegressor
, support grid search over the regressors
and even a single base regressor. (#522 via Qiang Gu)StackingCVClassifier
. (#522 via Qiang Gu)StackingCVRegressor
. (#512 via Qiang Gu)StackingCVRegressor
also enables grid search over the regressors
and even a single base regressor. When there are level-mixed parameters, GridSearchCV
will try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu)verbose
parameter to apriori
to show the current iteration number as well as the itemset size currently being sampled. (#519
class_name
parameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle)GridSearchCV
, etc.) the StackingCVRegressor
's meta regressor is now being accessed via 'meta_regressor__*
in the parameter grid. E.g., if a RandomForestRegressor
as meta- egressor was previously tuned via 'randomforestregressor__n_estimators'
, this has now changed to 'meta_regressor__n_estimators'
. (#515 via Qiang Gu)StackingClassifier
, StackingCVClassifier
and StackingRegressor
. (#522 via Qiang Gu)feature_selection.ColumnSelector
now also supports column names of type int
(in addition to str
names) if the input is a pandas DataFrame. (#500 via tetrar124
plot_confusion_matrix
for imbalanced datasets if show_absolute=True
and show_normed=True
. (#504)SparseDataFrame
is passed to apriori
and the dataframe has integer column names that don't start with 0
due to current limitations of the SparseDataFrame
implementation in pandas. (#503)mlxtend.evaluate.feature_importance_permutation
now correctly accepts scoring functions with proper function signature as metric
argument. #528
mlxtend.image
, EyepadAlign
, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili)mlxtend.evaluate.bias_variance_decomp
that decomposes the loss of a regressor or classifier into bias and variance terms. (#470)whitening
parameter to PrincipalComponentAnalysis
, to optionally whiten the transformed data such that the features have unit variance. (#475)PrincipalComponentAnalysis
to 'svd'
instead of 'eigen'
to improve numerical stability. (#474)mlxtend.image.extract_face_landmarks
now returns None
if no facial landmarks were detected instead of an array of all zeros. (#466)scatterplotmatrix
function to the plotting
module. (#437)sample_weight
option to StackingRegressor
, StackingClassifier
, StackingCVRegressor
, StackingCVClassifier
, EnsembleVoteClassifier
. (#438)RandomHoldoutSplit
class to perform a random train/valid split without rotation in SequentialFeatureSelector
, scikit-learn GridSearchCV
etc. (#442)PredefinedHoldoutSplit
class to perform a train/valid split, based on user-specified indices, without rotation in SequentialFeatureSelector
, scikit-learn GridSearchCV
etc. (#443)mlxtend.image
submodule for working on image processing-related tasks. (#457)extract_face_landmarks
based on dlib
to mlxtend.image
. (#458)method='oob'
option to the mlxtend.evaluate.bootstrap_point632_score
method to compute the classic out-of-bag bootstrap estimate (#459)method='.632+'
option to the mlxtend.evaluate.bootstrap_point632_score
method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459)mlxtend.evaluate.ftest
function to perform an F-test for comparing the accuracies of two or more classification models. (#460)mlxtend.evaluate.combined_ftest_5x2cv
function to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461)mlxtend.evaluate.difference_proportions
test for comparing two proportions (e.g., classifier accuracies) (#462)mlxtend.plotting.plot_confusion_matrix
. (#428)SequentialFeatureSelector
. (#377)SequentialFeatureSelector
now accepts custom feature names via the fit
method for more interpretable feature subset reports. (#379)SequentialFeatureSelector
is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379)ColumnSelector
now works with Pandas DataFrames columns. (#378 by Manuel Garrido)ExhaustiveFeatureSelector
estimator in mlxtend.feature_selection
now is safely stoppable mid-process by control+c. (#380)vectorspace_orthonormalization
and vectorspace_dimensionality
were added to mlxtend.math
to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382)mlxtend.frequent_patterns.apriori
now supports pandas SparseDataFrame
s to generate frequent itemsets. (#404 via Daniel Morales)plot_confusion_matrix
function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes.StackingRegressor
(via use_features_in_secondary
) like it is already supported in the other Stacking classes. (#418)support_only
to the association_rules
function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)apriori
are now frozenset
s (#393 by William Laney and #394)apriori
contains non 0, 1, True, False values. #419)clone
function. (#374)refit=False
in StackingRegressor
and StackingCVRegressor
(#384 and (#385) by selay01)StackingClassifier
to work with sparse matrices when use_features_in_secondary=True
(#408 by Floris Hoogenbook)StackingCVRegressor
to work with sparse matrices when use_features_in_secondary=True
(#416)StackingCVClassifier
to work with sparse matrices when use_features_in_secondary=True
(#417)feature_importance_permuation
function to compute the feature importance in classifiers and regressors via the permutation importance method (#358)ExhaustiveFeatureSelector
now optionally accepts **fit_params
for the estimator that is used for the feature selection. (#354 by Zach Griffith)SequentialFeatureSelector
now optionally accepts
**fit_params
for the estimator that is used for the feature selection. (#350 by Zach Griffith)plot_decision_regions
colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348)NonFittedErrors
if any method for inference is called prior to fitting the estimator. (#353)refit
parameter of both the StackingClassifier
and StackingCVClassifier
to use_clones
to be more explicit and less misleading. (#368)StackingCVClassifier
's meta features were not stored in the original order when shuffle=True
(#370)paired_ttest_resampled
)
to compare the performance of two models
(also called k-hold-out paired t-test). (#323)paired_ttest_kfold_cv
)
to compare the performance of two models
(also called k-hold-out paired t-test). (#324)paired_ttest_5x2cv
) proposed by Dieterrich (1998)
to compare the performance of two models. (#325)refit
parameter was added to stacking classes (similar to the refit
parameter in the EnsembleVoteClassifier
), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn's clone
function. (#325)ColumnSelector
now has a drop_axis
argument to use it in pipelines with CountVectorizers
. (#333)predict
or predict_meta_features
is called prior to calling the fit
method in StackingRegressor
and StackingCVRegressor
. (#315)plot_decision_regions
function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The old res
parameter has been deprecated. (#309 by Guillaume Poirier-Morency)onehot transformation
and the amount of candidates generated by the apriori
algorithm. (#327 by Jakub Smid)OnehotTransactions
class (which is typically often used in combination with the apriori
function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, the OnehotTransactions
class can be now be provided with sparse
argument to generate sparse representations of the onehot
matrix to further improve memory efficiency. (#328 by Jakub Smid)OneHotTransactions
has been deprecated and replaced by the TransactionEncoder
. (#332
plot_decision_regions
function now has three new parameters, scatter_kwargs
, contourf_kwargs
, and scatter_highlight_kwargs
, that can be used to modify the plotting style. (#342 by James Bourbeau)EnsembleVoteClassifier
when refit
was set to false
. (#322)plot_decision_regions
function. (#337)store_train_meta_features
parameter for fit
in StackingCVRegressor. if True, train meta-features are stored in self.train_meta_features_
.
New pred_meta_features
method for StackingCVRegressor
. People can get test meta-features using this method. (#294 via takashioya)store_train_meta_features
attribute and pred_meta_features
method for the StackingCVRegressor
were also added to the StackingRegressor
, StackingClassifier
, and StackingCVClassifier
(#299 & #300)evaluate.mcnemar_tables
) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. (#307)evaluate.cochrans_q
) for performing Cochran's Q test to compare the accuracy of multiple classifiers. (#310)requirements.txt
to setup.py
. (#304 via Colin Carrol)mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283)max_len
parameter for the frequent itemset generation via the apriori
function to allow for early stopping. (#270)SequentialFeatureSelector
or now in sorted order. (#262)SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
(#262)utils.Counter
now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)evaluate.permutation_test
, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250)'leverage'
and 'conviction
as evaluation metrics to the frequent_patterns.association_rules
function. (#246 & #247)loadings_
attribute to PrincipalComponentAnalysis
to compute the factor loadings of the features on the principal components. (#251)make_multiplexer_dataset
function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263)BootstrapOutOfBag
class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265)StackingClassifier
, StackingCVClassifier
, StackingRegressor
, StackingCVRegressor
, and EnsembleVoteClassifier
can now be tuned using scikit-learn's GridSearchCV
(#254 via James Bourbeau)'support'
column returned by frequent_patterns.association_rules
was changed to compute the support of "antecedant union consequent", and new antecedant support'
and 'consequent support'
column were added to avoid ambiguity. (#245)OnehotTransactions
to be cloned via scikit-learn's clone
function, which is required by e.g., scikit-learn's FeatureUnion
or GridSearchCV
(via Iaroslav Shcherbatyi). (#249)self._init_time
parameter in _IterativeModel
subclasses. (#256)plot_ecdf
when run on Python 2.7. (264)PrincipalComponentAnalysis
are no being scaled so that the eigenvalues via solver='eigen'
and solver='svd'
now store eigenvalues that have the same magnitudes. (#251)