Mlxtend Versions Save

A library of extension and helper modules for Python's data analysis and machine learning libraries.

v0.17.0

4 years ago
New Features
  • Added an enhancement to the existing iris_data() such that both the UCI Repository version of the Iris dataset as well as the corrected, original version of the dataset can be loaded, which has a slight difference in two data points (consistent with Fisher's paper; this is also the same as in R). (via #539 via janismdhanbad)
  • Added optional groups parameter to SequentialFeatureSelector and ExhaustiveFeatureSelector fit() methods for forwarding to sklearn CV (#537 via arc12)
  • Added a new plot_pca_correlation_graph function to the mlxtend.plotting submodule for plotting a PCA correlation graph. (#544 via Gabriel-Azevedo-Ferreira)
  • Added a zoom_factor parameter to the mlxten.plotting.plot_decision_region function that allows users to zoom in and out of the decision region plots. (#545)
  • Added a function fpgrowth that implements the FP-Growth algorithm for mining frequent itemsets as a drop-in replacement for the existing apriori algorithm. (#550 via Steve Harenberg)
  • New heatmap function in mlxtend.plotting. (#552)
  • Added a function fpmax that implements the FP-Max algorithm for mining maximal itemsets as a drop-in replacement for the fpgrowth algorithm. (#553 via Steve Harenberg)
  • New figsize parameter for the plot_decision_regions function in mlxtend.plotting. (#555 via Mirza Hasanbasic)
  • New low_memory option for the apriori frequent itemset generating function. Setting low_memory=False (default) uses a substantially optimized version of the algorithm that is 3-6x faster than the original implementation (low_memory=True). (#567 via jmayse)
Changes
  • Now uses the latest joblib library under the hood for multiprocessing instead of sklearn.externals.joblib. (#547)
  • Changes to StackingCVClassifier and StackingCVRegressor such that first-level models are allowed to generate output of non-numeric type. (#562)
Bug Fixes
  • Fixed documentation of iris_data() under iris.py by adding a note about differences in the iris data in R and UCI machine learning repo.
  • Make sure that if the 'svd' mode is used in PCA, the number of eigenvalues is the same as when using 'eigen' (append 0's zeros in that case) (#565)

v0.16.0

5 years ago
New Features
  • StackingCVClassifier and StackingCVRegressor now support random_state parameter, which, together with shuffle, controls the randomness in the cv splitting. (#523 via Qiang Gu)
  • StackingCVClassifier and StackingCVRegressor now have a new drop_last_proba parameter. It drops the last "probability" column in the feature set since if True, because it is redundant: p(y_c) = 1 - p(y_1) + p(y_2) + ... + p(y_{c-1}). This can be useful for meta-classifiers that are sensitive to perfectly collinear features. (#532)
  • Other stacking estimators, including StackingClassifier, StackingCVClassifier and StackingRegressor, support grid search over the regressors and even a single base regressor. (#522 via Qiang Gu)
  • Adds multiprocessing support to StackingCVClassifier. (#522 via Qiang Gu)
  • Adds multiprocessing support to StackingCVRegressor. (#512 via Qiang Gu)
  • Now, the StackingCVRegressor also enables grid search over the regressors and even a single base regressor. When there are level-mixed parameters, GridSearchCV will try to replace hyperparameters in a top-down order (see the documentation for examples details). (#515 via Qiang Gu)
  • Adds a verbose parameter to apriori to show the current iteration number as well as the itemset size currently being sampled. (#519
  • Adds an optional class_name parameter to the confusion matrix function to display class names on the axis as tick marks. (#487 via sandpiturtle)
Changes
  • Due to new features, restructuring, and better scikit-learn support (for GridSearchCV, etc.) the StackingCVRegressor's meta regressor is now being accessed via 'meta_regressor__* in the parameter grid. E.g., if a RandomForestRegressor as meta- egressor was previously tuned via 'randomforestregressor__n_estimators', this has now changed to 'meta_regressor__n_estimators'. (#515 via Qiang Gu)
  • The same change mentioned above is now applied to other stacking estimators, including StackingClassifier, StackingCVClassifier and StackingRegressor. (#522 via Qiang Gu)
Bug Fixes
  • The feature_selection.ColumnSelector now also supports column names of type int (in addition to str names) if the input is a pandas DataFrame. (#500 via tetrar124
  • Fix unreadable labels in plot_confusion_matrix for imbalanced datasets if show_absolute=True and show_normed=True. (#504)
  • Raises a more informative error if a SparseDataFrame is passed to apriori and the dataframe has integer column names that don't start with 0 due to current limitations of the SparseDataFrame implementation in pandas. (#503)
  • SequentialFeatureSelector now supports DataFrame as input for all operating modes (forward/backward/floating). #506
  • mlxtend.evaluate.feature_importance_permutation now correctly accepts scoring functions with proper function signature as metric argument. #528

v0.15.0

5 years ago
New Features
  • Adds a new transformer class to mlxtend.image, EyepadAlign, that aligns face images based on the location of the eyes. (#466 by Vahid Mirjalili)
  • Adds a new function, mlxtend.evaluate.bias_variance_decomp that decomposes the loss of a regressor or classifier into bias and variance terms. (#470)
  • Adds a whitening parameter to PrincipalComponentAnalysis, to optionally whiten the transformed data such that the features have unit variance. (#475)
Changes
  • Changed the default solver in PrincipalComponentAnalysis to 'svd' instead of 'eigen' to improve numerical stability. (#474)
  • The mlxtend.image.extract_face_landmarks now returns None if no facial landmarks were detected instead of an array of all zeros. (#466)
Bug Fixes
  • The eigenvectors maybe have not been sorted in certain edge cases if solver was 'eigen' in PrincipalComponentAnalysis and LinearDiscriminantAnalysis. (#477, #478)

v0.14.0

5 years ago
New Features
  • Added a scatterplotmatrix function to the plotting module. (#437)
  • Added sample_weight option to StackingRegressor, StackingClassifier, StackingCVRegressor, StackingCVClassifier, EnsembleVoteClassifier. (#438)
  • Added a RandomHoldoutSplit class to perform a random train/valid split without rotation in SequentialFeatureSelector, scikit-learn GridSearchCV etc. (#442)
  • Added a PredefinedHoldoutSplit class to perform a train/valid split, based on user-specified indices, without rotation in SequentialFeatureSelector, scikit-learn GridSearchCV etc. (#443)
  • Created a new mlxtend.image submodule for working on image processing-related tasks. (#457)
  • Added a new convenience function extract_face_landmarks based on dlib to mlxtend.image. (#458)
  • Added a method='oob' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the classic out-of-bag bootstrap estimate (#459)
  • Added a method='.632+' option to the mlxtend.evaluate.bootstrap_point632_score method to compute the .632+ bootstrap estimate that addresses the optimism bias of the .632 bootstrap (#459)
  • Added a new mlxtend.evaluate.ftest function to perform an F-test for comparing the accuracies of two or more classification models. (#460)
  • Added a new mlxtend.evaluate.combined_ftest_5x2cv function to perform an combined 5x2cv F-Test for comparing the performance of two models. (#461)
  • Added a new mlxtend.evaluate.difference_proportions test for comparing two proportions (e.g., classifier accuracies) (#462)
Changes
  • Addressed deprecations warnings in NumPy 0.15. (#425)
  • Because of complications in PR (#459), Python 2.7 was now dropped; since official support for Python 2.7 by the Python Software Foundation is ending in approx. 12 months anyways, this re-focussing will hopefully free up some developer time with regard to not having to worry about backward compatibility
Bug Fixes
  • Fixed an issue with a missing import in mlxtend.plotting.plot_confusion_matrix. (#428)

0.13.0

5 years ago

Version 0.13.0 (07/20/2018)

New Features
  • A meaningful error message is now raised when a cross-validation generator is used with SequentialFeatureSelector. (#377)
  • The SequentialFeatureSelector now accepts custom feature names via the fit method for more interpretable feature subset reports. (#379)
  • The SequentialFeatureSelector is now also compatible with Pandas DataFrames and uses DataFrame column-names for more interpretable feature subset reports. (#379)
  • ColumnSelector now works with Pandas DataFrames columns. (#378 by Manuel Garrido)
  • The ExhaustiveFeatureSelector estimator in mlxtend.feature_selection now is safely stoppable mid-process by control+c. (#380)
  • Two new functions, vectorspace_orthonormalization and vectorspace_dimensionality were added to mlxtend.math to use the Gram-Schmidt process to convert a set of linearly independent vectors into a set of orthonormal basis vectors, and to compute the dimensionality of a vectorspace, respectively. (#382)
  • mlxtend.frequent_patterns.apriori now supports pandas SparseDataFrames to generate frequent itemsets. (#404 via Daniel Morales)
  • The plot_confusion_matrix function now has the ability to show normalized confusion matrix coefficients in addition to or instead of absolute confusion matrix coefficients with or without a colorbar. The text display method has been changed so that the full range of the colormap is used. The default size is also now set based on the number of classes.
  • Added support for merging the meta features with the original input features in StackingRegressor (via use_features_in_secondary) like it is already supported in the other Stacking classes. (#418)
  • Added a support_only to the association_rules function, which allow constructing association rules (based on the support metric only) for cropped input DataFrames that don't contain a complete set of antecedent and consequent support values. (#421)
Changes
  • Itemsets generated with apriori are now frozensets (#393 by William Laney and #394)
  • Now raises an error if a input DataFrame to apriori contains non 0, 1, True, False values. #419)
Bug Fixes
  • Allow mlxtend estimators to be cloned via scikit-learn's clone function. (#374)
  • Fixes bug to allow the correct use of refit=False in StackingRegressor and StackingCVRegressor (#384 and (#385) by selay01)
  • Allow StackingClassifier to work with sparse matrices when use_features_in_secondary=True (#408 by Floris Hoogenbook)
  • Allow StackingCVRegressor to work with sparse matrices when use_features_in_secondary=True (#416)
  • Allow StackingCVClassifier to work with sparse matrices when use_features_in_secondary=True (#417)

v0.12.0

6 years ago
Downloads
New Features
  • A new feature_importance_permuation function to compute the feature importance in classifiers and regressors via the permutation importance method (#358)
  • The fit method of the ExhaustiveFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. (#354 by Zach Griffith)
  • The fit method of the SequentialFeatureSelector now optionally accepts **fit_params for the estimator that is used for the feature selection. (#350 by Zach Griffith)
Changes
  • Replaced plot_decision_regions colors by a colorblind-friendly palette and adds contour lines for decision regions. (#348)
  • All stacking estimators now raise NonFittedErrors if any method for inference is called prior to fitting the estimator. (#353)
  • Renamed the refit parameter of both the StackingClassifier and StackingCVClassifier to use_clones to be more explicit and less misleading. (#368)
Bug Fixes
  • Various changes in the documentation and documentation tools to fix formatting issues (#363)
  • Fixes a bug where the StackingCVClassifier's meta features were not stored in the original order when shuffle=True (#370)
  • Many documentation improvements, including links to the User Guides in the API docs (#371)

v0.11.0

6 years ago
New Features
  • New function implementing the resampled paired t-test procedure (paired_ttest_resampled) to compare the performance of two models (also called k-hold-out paired t-test). (#323)
  • New function implementing the k-fold paired t-test procedure (paired_ttest_kfold_cv) to compare the performance of two models (also called k-hold-out paired t-test). (#324)
  • New function implementing the 5x2cv paired t-test procedure (paired_ttest_5x2cv) proposed by Dieterrich (1998) to compare the performance of two models. (#325)
  • A refit parameter was added to stacking classes (similar to the refit parameter in the EnsembleVoteClassifier), to support classifiers and regressors that follow the scikit-learn API but are not compatible with scikit-learn's clone function. (#325)
  • The ColumnSelector now has a drop_axis argument to use it in pipelines with CountVectorizers. (#333)
Changes
  • Raises an informative error message if predict or predict_meta_features is called prior to calling the fit method in StackingRegressor and StackingCVRegressor. (#315)
  • The plot_decision_regions function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The old res parameter has been deprecated. (#309 by Guillaume Poirier-Morency)
  • Apriori code is faster due to optimization in onehot transformation and the amount of candidates generated by the apriori algorithm. (#327 by Jakub Smid)
  • The OnehotTransactions class (which is typically often used in combination with the apriori function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, the OnehotTransactions class can be now be provided with sparse argument to generate sparse representations of the onehot matrix to further improve memory efficiency. (#328 by Jakub Smid)
  • The OneHotTransactions has been deprecated and replaced by the TransactionEncoder. (#332
  • The plot_decision_regions function now has three new parameters, scatter_kwargs, contourf_kwargs, and scatter_highlight_kwargs, that can be used to modify the plotting style. (#342 by James Bourbeau)
Bug Fixes
  • Fixed issue when class labels were provided to the EnsembleVoteClassifier when refit was set to false. (#322)
  • Allow arrays with 16-bit and 32-bit precision in plot_decision_regions function. (#337)
  • Fixed bug that raised an indexing error if the number of items was <= 1 when computing association rules using the conviction metric. (#340)

v0.10.0

6 years ago
New Features
  • New store_train_meta_features parameter for fit in StackingCVRegressor. if True, train meta-features are stored in self.train_meta_features_. New pred_meta_features method for StackingCVRegressor. People can get test meta-features using this method. (#294 via takashioya)
  • The new store_train_meta_features attribute and pred_meta_features method for the StackingCVRegressor were also added to the StackingRegressor, StackingClassifier, and StackingCVClassifier (#299 & #300)
  • New function (evaluate.mcnemar_tables) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. (#307)
  • New function (evaluate.cochrans_q) for performing Cochran's Q test to compare the accuracy of multiple classifiers. (#310)
Changes
Bug Fixes
  • Improved numerical stability for p-values computed via the the exact McNemar test (#306)
  • nose is not required to use the library (#302)

v.0.9.1

6 years ago

Version 0.9.1 (2017-11-19)

Downloads
New Features
  • Added mlxtend.evaluate.bootstrap_point632_score to evaluate the performance of estimators using the .632 bootstrap. (#283)
  • New max_len parameter for the frequent itemset generation via the apriori function to allow for early stopping. (#270)
Changes
  • All feature index tuples in SequentialFeatureSelector or now in sorted order. (#262)
  • The SequentialFeatureSelector now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. (#262)
  • utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. (#278 via Mathew Savage)
Bug Fixes
  • Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283)

v0.9.0

6 years ago
New Features
  • Added evaluate.permutation_test, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250)
  • Added 'leverage' and 'conviction as evaluation metrics to the frequent_patterns.association_rules function. (#246 & #247)
  • Added a loadings_ attribute to PrincipalComponentAnalysis to compute the factor loadings of the features on the principal components. (#251)
  • Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259)
  • New make_multiplexer_dataset function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263)
  • Added a new BootstrapOutOfBag class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265)
  • The parameters for StackingClassifier, StackingCVClassifier, StackingRegressor, StackingCVRegressor, and EnsembleVoteClassifier can now be tuned using scikit-learn's GridSearchCV (#254 via James Bourbeau)
Changes
  • The 'support' column returned by frequent_patterns.association_rules was changed to compute the support of "antecedant union consequent", and new antecedant support' and 'consequent support' column were added to avoid ambiguity. (#245)
  • Allow the OnehotTransactions to be cloned via scikit-learn's clone function, which is required by e.g., scikit-learn's FeatureUnion or GridSearchCV (via Iaroslav Shcherbatyi). (#249)
Bug Fixes
  • Fix issues with self._init_time parameter in _IterativeModel subclasses. (#256)
  • Fix imprecision bug that occurred in plot_ecdf when run on Python 2.7. (264)
  • The vectors from SVD in PrincipalComponentAnalysis are no being scaled so that the eigenvalues via solver='eigen' and solver='svd' now store eigenvalues that have the same magnitudes. (#251)