Visual analysis and diagnostic tools to facilitate machine learning model selection.
Deployed: Wednesday, November 14, 2018 Contributors: @rebeccabilbro, @bbengfort, @zjpoh, @Kautumn06, @ndanielsen, @drwaterman, @lwgray, @pdamodaran, @Juan0001, @abatula, @peterespinosa, @jlinGG, @rlshuhart, @archaeocharlie, @dschoenleber, @black-tea, @iguk1987, @mohfadhil, @lacanlale, @agodbehere, @sivu1, @gokriznastic
Major Changes:
- Target
module added for visualizing dependent variable in supervised models.
- Added a prototype for a missing values visualizer to the contrib
module.
- BalancedBinningReference
visualizer for thresholding unbalanced data (undocumented).
- CVScores
visualizer to instrument cross-validation.
- FeatureCorrelation
visualizer to compare relationship between a single independent variable and the target.
- ICDM
visualizer, intercluster distance mapping using projections similar to those used in pyLDAVis.
- PrecisionRecallCurve
visualizer showing the relationship of precision and recall in a threshold-based classifier.
- Enhanced FeatureImportance
for multi-target and multi-coefficient models (e.g probabilistic models) and allows stacked bar chart.
- Adds option to plot PDF to ResidualsPlot
histogram.
- Adds document boundaries option to DispersionPlot
and uses colored markers to depict class.
- Added alpha parameter for opacity to the scatter plot visualizer.
- Modify KElbowVisualizer
to accept a list of k values.
- ROCAUC
bugfix to allow binary classifiers that only have a decision function.
- TSNE
bugfix so that title and size params are respected.
- ConfusionMatrix
bugfix to correct percentage displays adding to 100.
- ResidualsPlot
bugfix to ensure specified colors are both in histogram and scatterplot.
- Fixed unicode decode error on Py2 compatible Windows using Hobbies corpus.
- Require matplotlib 1.5.1 or matplotlib 2.0 (matplotlib 3.0 not supported yet).
- Yellowbrick now depends on SciPy 1.0 and scikit-learn 0.20.
- Deprecated percent
and sample_weight
arguments to ConfusionMatrix
fit
method.
Minor Changes:
- Removed hardcoding of SilhouetteVisualizer
axes dimensions.
- Audit classifiers to ensure they conform to score API.
- Fix for Manifold
fit_transform
bug.
- Fixed Manifold
import bug.
- Started reworking datasets API for easier loading of examples.
- Added Timer utility for keeping track of fit times.
- Added slides to documentation for teachers teaching ML/Yellowbrick.
- Added an FAQ to the documentation.
- Manual legend drawing utility.
- New examples notebooks for Regression and Clustering.
- Example of interactive classification visualization using ipywidgets.
- Example of using Yellowbrick with PyTorch.
- Repairs to ROCAUC
tests and binary/multiclass ROCAUC construction.
- Rename tests/random.py to tests/rand.py to prevent NumPy errors.
- Improves ROCAUC
, KElbowVisualizer
, and SilhouetteVisualizer
documentation.
- Fixed visual display bug in JointPlotVisualizer
.
- Fixed image in JointPlotVisualizer
documentation.
- Clear figure option to poof.
- Fix color plotting error in residuals plot quick method.
- Fixed bugs in KElbowVisualizer
, FeatureImportance
, Index, and Datasets documentation.
- Use LGTM for code quality analysis (replacing Landscape).
- Updated contributing docs for better PR workflow.
- Submitted JOSS paper.
Deployed: Thursday, July 12, 2018 Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @RaulPL, @Kautumn06, @ariley1472, @ralle123, @thekylesaurus, @lumega, @pdamodaran, @lumega, @chrisfs, @mitevpi, @sayali-sonawane
Major Changes:
- Added Support to ClassificationReport
- @ariley1472
- We have an updated Image Gallery - @ralle123
- Improved performance of ParallelCoordinates
Visualizer @thekylesaurus
- Added Alpha Transparency to RadViz
Visualizer @lumega
- CVScores
Visualizer - @pdamodaran
- Added fast and alpha parameters to ParallelCoordinates
visualizer @bbengfort
- Make support an optional parameter for ClassificationReport
@lwgray
- Bug Fix for Usage of multidimensional arrays in FeatureImportance
visualizer @rebeccabilbro
- Deprecate ScatterVisualizer
to contrib @bbengfort
- Implements histogram alongside ResidualsPlot
@bbengfort
- Adds biplot to the PCADecomposition
visualizer @RaulPL
- Adds Datasaurus Dataset to show importance of visualizing data @lwgray
- Add DispersionPlot
Plot @lwgray
Minor Changes:
- Fix grammar in tutorial.rst - @chrisfs
- Added Note to tutorial indicating subtle differences when working in Jupyter notebook - @chrisfs
- Update Issue template @bbengfort
- Added Test to check for NLTK postag data availability - @sayali-sonawane
- Clarify quick start documentation @mitevpi
- Deprecated DecisionBoundary
- Threshold Visualization aliases deprecated
Deployed: Thursday, May 17, 2018 Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @ianozsvald, @jtpio, @bharaniabhishek123, @RaulPL, @tabishsada, @Kautumn06, @NealHumphrey
Changes:
model_selection
module with LearningCurve
and ValidationCurve
visualizers.RFECV
(recursive feature elimination) visualizer with cross-validation visualizes how removing the least performing features improves the overall model.VisualizerGrid
is an implementation of the MultipleVisualizer
that creates axes for each visualizer using plt.subplots
, laying the visualizers out as a grid.yellowbrick.datasets
to load example datasets.StatsModelsWrapper
was added to yellowbrick.contrib.statsmodels
that will allow user to use StatsModels estimators with visualizers.ClassificationReport
documentation to include more details about how to interpret each of the metrics and compare the reports against each other.ThreshViz
to be defined as DiscriminationThreshold
, implements a few more discrimination features such as F1 score, maximizing arguments and annotations.distortion_score
to handle sparse matrices.is_probabilistic
type checker and converted the type checking tests to pytest.contrib
module and DecisionBoundaries
visualizer has been moved to it until further work is completed.Bug Fixes
RandomVisualizer
for testing and add it to the VisualizerGrid
test cases.tests.test_classifier.test_class_prediction_error.py
to remove hardcoded data.Deprecation Warnings
ScatterPlotVisualizer
is being moved to contrib in 0.8DecisionBoundaryVisualizer
is being moved to contrib in 0.8ThreshViz
is renamed to DiscriminationThreshold
.NOTE: These deprecation warnings originally mentioned deprecation in 0.7, but their life was extended by an additional version.
Markdown for GitHub repo:
Deployed: Saturday, March 17, 2018 Contributors: @bbengfort, @ndanielsen, @rebeccabilbro, @lwgray, @Kautumn06, @georgerichardson, @pbs929, @Aylr, @gary-mayfield, @jkeung
FeatureImportances
Visualizer enables the user to visualize the most informative (relative and absolute) features in their model, plotting a bar graph of feature_importances_
or coef_
attributes.ExplainedVariance
Visualizer produces a plot of the explained variance resulting from a dimensionality reduction to help identify the best tradeoff between number of dimensions and amount of information retained from the data.GridSearchVisualizer
creates a color plot showing the best grid search scores across two parameters.ClassPredictionError
Visualizer is a heatmap implementation of the class balance visualizer, which provides a way to quickly understand how successfully your classifier is predicting the correct classes.ThresholdVisualizer
allows the user to visualize the bounds of precision, recall and queue rate at different thresholds for binary targets after a given number of trials.MultiFeatureVisualizer
helper class to provide base functionality for getting the names of features for use in plot annotation.JointPlot
, AlphaPlot
, FreqDist
, RadViz
, ElbowPlot
, SilhouettePlot
, ConfusionMatrix
, Rank1D
, and Rank2D
.TSNEVisualizer
Value Error when no classes are specified.RadViz
! This visualizer has also been updated to ensure there's a visualization even when there are missing valuesRocAuc
to correctly check the number of classesnp.copy
instead of np.tolist
to avoid NumPy deprecation warning.DataVisualizer
updated to remove np.nan
values and warn the user that nans are not plotted.ClassificationReport
no longer has lines that run through the numbers, is more grid-likeScatterPlotVisualizer
is being moved to contrib in 0.7DecisionBoundaryVisualizer
is being moved to contrib in 0.7Deployed: Wednesday, August 9, 2017 Contributors: @bbengfort, @rebeccabilbro, @ndanielsen, @cjmorale, @JimStearns206, @pbs929, @jkeung
Update to the deployment docs and package on both Anaconda and PyPI.
This release is an intermediate version bump in anticipation of the PyCon 2017 sprints.
The primary goals of this version were to (1) update the Yellowbrick dependencies (2) enhance the Yellowbrick documentation to help orient new users and contributors, and (3) make several small additions and upgrades (e.g. pulling the Yellowbrick utils into a standalone module).
We have updated the Scikit-Learn and SciPy dependencies from version 0.17.1 or later to 0.18 or later. This primarily entails moving from from sklearn.cross_validation import train_test_split
to from sklearn.model_selection import train_test_split
.
The updates to the documentation include new Quickstart and Installation guides as well as updates to the Contributors documentation, which is modeled on the Scikit-Learn contributing documentation.
This version also included upgrades to the KMeans visualizer, which now supports not only silhouette_score
but also distortion_score
and calinski_harabaz_score
. The distortion_score
computes the mean distortion of all samples as the sum of the squared distances between each observation and its closest centroid. This is the metric that K-Means attempts to minimize as it is fitting the model. The calinski_harabaz_score
is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.
Finally, this release includes a prototype of the VisualPipeline
, which extends Scikit-Learn's Pipeline
class, allowing multiple Visualizers to be chained or sequenced together.
Deployed: Monday, May 22, 2017 Contributors: @bbengfort, @rebeccabilbro, @ndanielsen
sklearn.cross_validation
with model_selection
distortion_score
and calinski_harabaz_score
computations and visualizations to KMeans visualizer.self.ax
property on all of the individual draw
methods with a new property on the Visualizer
class that ensures all visualizers automatically have axes.This release is the culmination of the Spring 2017 DDL Research Labs that focused on developing Yellowbrick as a community effort guided by a sprint/agile workflow. We added several more visualizers, did a lot of user testing and bug fixes, updated the documentation, and generally discovered how best to make Yellowbrick a friendly project to contribute to.
Notable in this release is the inclusion of two new feature visualizers that use few, simple dimensions to visualize features against the target. The JointPlotVisualizer
graphs a scatter plot of two dimensions in the data set and plots a best fit line across it. The ScatterVisualizer
also uses two features, but also colors the graph by the target variable, adding a third dimension to the visualization.
This release also adds support for clustering visualizations, namely the elbow method for selecting K, KElbowVisualizer
and a visualization of cluster size and density using the SilhouetteVisualizer
. The release also adds support for regularization analysis using the AlphaSelection
visualizer. Both the text and classification modules were also improved with the inclusion of the PosTagVisualizer
and the ConfusionMatrix
visualizer respectively.
This release also added an Anaconda repository and distribution so that users can conda install
yellowbrick. Even more notable, we got yellowbrick stickers! We've also updated the documentation to make it more friendly and a bit more visual; fixing the API rendering errors. All-in-all, this was a big release with a lot of contributions and we thank everyone that participated in the lab!
Deployed: Thursday, May 4, 2017 Contributors: @bbengfort, @rebeccabilbro, @ndanielsen, @mattandahalfew, @pdamodaran, @NealHumphrey, @jkeung, @balavenkatesan, @pbwitt, @morganmendis, @tuulihill
PosTagVisualizer
.AlphaSelection
ConfusionMatrix
KElbowVisualizer
SilhouetteVisualizer
JointPlotVisualizer
ScatterVisualizer
Intermediate sprint to demonstrate prototype implementations of text visualizers for NLP models. Primary contributions were the FreqDistVisualizer
and the TSNEVisualizer
.
The TSNEVisualizer
displays a projection of a vectorized corpus in two dimensions using TSNE, a nonlinear dimensionality reduction method that is particularly well suited to embedding in two or three dimensions for visualization as a scatter plot. TSNE is widely used in text analysis to show clusters or groups of documents or utterances and their relative proximities.
The FreqDistVisualizer
implements frequency distribution plot that tells us the frequency of each vocabulary item in the text. In general, it could count any kind of observable event. It is a distribution because it tells us how the total number of word tokens in the text are distributed across the vocabulary items.
Deployed: Wednesday, February 22, 2017 Contributors: @rebeccabilbro, @bbengfort
TextVisualizer
Hardened the Yellowbrick API to elevate the idea of a Visualizer to a first principle. This included reconciling shifts in the development of the preliminary versions to the new API, formalizing Visualizer methods like draw()
and finalize()
, and adding utilities that revolve around Scikit-Learn. To that end we also performed administrative tasks like refreshing the documentation and preparing the repository for more and varied open source contributions.
Deployed: Friday, January 20, 2017 Contributors: @bbengfort , @rebeccabilbro, @StampedPassp0rt