Autogluon Versions Save

Fast and Accurate ML in 3 Lines of Code

v0.2.0

3 years ago

v0.2.0 introduces numerous optimizations that reduce Tabular average inference time by 4x and average disk usage by 10x compared to v0.1.0, as well as a refactored ImagePredictor API to better align with the other tasks and a 20x inference speedup in Vision tasks. This release contains 42 commits from 9 contributors.

This release is non-breaking when upgrading from v0.1.0, with four exceptions:

  1. ImagePredictor.predict and ImagePredictor.predict_proba have different output formats.
  2. TabularPredictor.evaluate and TabularPredictor.evaluate_predictions have different output formats.
  3. Custom dictionary inputs to TabularPredictor.fit's hyperparameter_tune_kwargs argument now have a different format.
  4. Models trained in v0.1.0 should only be loaded with v0.1.0. Loading models trained in different versions of AutoGluon is not supported.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.1.0...v0.2.0

Thanks to the 9 contributors that contributed to the v0.2.0 release!

Special thanks to the 3 first-time contributors! @taesup-aws, @ValerioPerrone, @lukemorrill

Full Contributor List (ordered by # of commits):

@Innixma, @zhreshold, @gradientsky, @jwmueller, @mseeger, @sxjscience, @taesup-aws, @ValerioPerrone, @lukemorrill

Major Changes

Tabular

  • Reduced overall inference time on best_quality preset by 4x (and 2x on others). @innixma, @gradientsky
  • Reduced overall disk usage on best_quality preset by 10x. @innixma
  • Reduced training time and inference time of K-Nearest-Neighbor models by 250x, and reduced disk usage by 10x via:
    • Efficient out-of-fold implementation (10x training & inference speedup, 10x reduced disk usage) on best_quality preset. @innixma (#1022)
    • [Experimental] Integration of the scikit-learn-intelex package (25x training & inference speedup). @innixma (#1049)
      • This is currently not installed by default. Try it via pip install autogluon.tabular[all,skex] or pip install "scikit-learn-intelex<2021.3". Once installed, AutoGluon will automatically use it.
  • Reduced training time, inference time, and disk usage of RandomForest and ExtraTrees models by 10x via efficient out-of-fold implementation. @innixma (#1066, #1082)
  • Reduced training time by 30% and inference time by 75% on the FastAI neural network model. @gradientsky (#977)
  • Added quantile as a new problem_type to support quantile regression problems. @taesup-aws, @jwmueller (#1005, #1040)
  • [Experimental] Added GPU accelerated RandomForest, K-Nearest-Neighbors and Linear models via integration with NVIDIA RAPIDS. @innixma (#995, #997, #1000)
    • This is not enabled by default. Try it out by first installing RAPIDS and then installing AutoGluon.
      • Currently, the models need to be specially passed to the .fit hyperparameters argument. Refer to the below kaggle kernel for an example or check out RAPIDS official AutoGluon example.
    • See how to use AutoGluon + RAPIDS to get top 1% on the Otto kaggle competition with an interactive kaggle kernel!
  • [Experimental] Added option to specify early stopping rounds for models LightGBM, CatBoost, and XGBoost via a new model parameter ag.early_stop. @innixma (#1037)
    • Try it out via hyperparameters={'XGB': {'ag.early_stop': 500}}.
    • The API for this may change in future releases as we try to optimize usage of early stopping in AutoGluon.
  • [Experimental] Added adaptive early stopping to LightGBM. This will attempt to choose when to stop training the model more smartly than using an early stopping rounds value. @innixma (#1042)
  • Re-ordered model training priority to perform better when time_limit is small. For time_limit=3600 on datasets with over 100,000 rows, v0.2.0 has a 65% win-rate over v0.1.0. @innixma (#1059, #1084)
  • Adjusted time allocation to stack layers when performing multi-layer stacking to allow for longer training on earlier layers. @innixma (#1075)
  • Updated CatBoost to v0.25. @innixma (#1064)
  • Added extra_metrics argument to .leaderboard. @innixma (#1058)
  • Added feature group importance support to .feature_importance. @innixma (#989)
    • Now, users can get the combined importance of a group of features.
    • predictor.feature_importance(test_data, features=['A', 'B', 'C', ('AB', ['A', 'B'])])
  • [BREAKING] Refactored .evalute and .evaluate_predictions to be easier to use and share the same code logic. @innixma (#1080)
    • The output type has changed and the sign of the metric score has been flipped in some circumstances.

Vision

  • Reduced inference time by 20x via various optimizations in inference batching. @zhreshold
  • Fixed a problem when loading saved models on cpu-only machines when models are trained on GPU. @zhreshold
  • Improved model fitting performance by up to 10% for ObjectDetector when presets is empty. @zhreshold
  • [BREAKING] Refactored predict and predict_proba methods in ImagePredictor to have the same output formats as TabularPredictor and TextPredictor. @zhreshold (#1044)
    • This change is BREAKING. Previous users of v0.1.0 should ensure they update to use the new formats if they made use of the old predict and predict_proba when switching to v0.2.0.
  • Added improved support for CSV and pandas DataFrame input to ImagePredictor. @zhreshold (#1010)
  • Added early stopping strategies that significantly improve training efficiency. @zhreshold (#1039)

General

  • [Experimental] Added new hyperparameter tuning method: constrained bayesian optimization. @ValerioPerrone (#1034)
  • General HPO code improvement / cleanup. @mseeger, @gradientsky (#971, #1002, #1050)
  • Fixed ENAS issue when passing in custom datasets. @lukemorrill (#1015)
  • Fixed incorrect dependency link between autogluon.mxnet and autogluon.extra causing crash on import. @innixma (#1032)
  • Various minor updates and fixes. @innixma, @jwmueller, @zhreshold, @sxjscience (#990, #996, #998, #1007, #1035, #1052, #1055, #1057, #1072, #1081, #1088)

v0.1.0

3 years ago

v0.1.0 is our largest release yet, containing 173 commits from 20 contributors over the course of 5 months.

This release is API breaking from past releases, as AutoGluon is now a namespace package. Please refer to our documentation for using v0.1.0. New GitHub issues based on versions earlier then v0.1.0 will not be addressed, and we recommend all users to upgrade to v0.1.0 as soon as possible.

See the full commit change-log here: https://github.com/awslabs/autogluon/compare/v0.0.15...v0.1.0

Try it out yourself in 5 minutes with our Colab Tutorial.

Special thanks to the 20 contributors that contributed to the v0.1.0 release! Contributor List:

@innixma, @gradientsky, @sxjscience, @jwmueller, @zhreshold, @mseeger, @daikikatsuragawa, @Chudbrochil, @adrienatallah, @jonashaag, @songqiang, @larroy, @sackoh, @muhyun, @rschmucker, @aaronkl, @kaixinbaba, @sflender, @jojo19893, @mak-454

Major Changes

General

  • MacOS is now fully supported.
  • Windows is now experimentally supported. Installation instructions for Windows are still in progress.
  • Python 3.8 is now supported.
  • Overhauled API. APIs between TabularPredictor, TextPredictor, and ImagePredictor are now much more consistent. @innixma, @sxjscience, @zhreshold, @jwmueller, @gradientsky
  • Updated AutoGluon to a namespace package, now individual modules can be separately installed to improve flexibility. As an example, to only install HPO related functionality, you can get a minimal install via pip install autogluon.core. For a full list of available submodules, see this link. @gradientsky (#694)
  • Significantly improved robustness of HPO scheduling to avoid errors for user. @mseeger, @gradientsky, @rschmucker, @innixma (#713, #735, #750, #754, #824, #920, #924)
  • mxnet is no longer a required dependency in AutoGluon. @mseeger (#726)
  • Various dependency version upgrades.

Tabular

  • Major API refactor. @innixma (#768, #855, #869)
  • Multimodal Tabular + Text support (Tutorial). Now Tabular can train a multi-modal Tabular + Text transformer model alongside its standard models, and achieve state-of-the-art results on multi-modal tabular + text datasets with 3 lines of code. @sxjscience, @Innixma (#740, #752, #756, #770, #776, #794, #802, #848, #852, #867, #869, #871, #877)
  • GPU support for LightGBM, CatBoost, XGBoost, MXNet neural network, and FastAI neural network models. Specify ag_args_fit={'num_gpus': 1} in TabularPredictor.fit() to enable. @innixma (#896)
  • sample_weight support. Tabular can now handle user-defined sample weights for imbalanced datasets. @jwmueller (#942, #962)
  • Multi-label prediction support (Tutorial). Tabular can now predict across multiple label columns. @jwmueller (#953)
  • Added student model ensembling in model distillation. @innixma (#937)
  • Generally improved accuracy and robustness due to a variety of internal improvements and the addition of new models. (v0.1.0 gets a better score on over 70% of datasets in benchmarking compared to v0.0.15!)
  • New model: XGBoost. @sackoh (#691)
  • New model: FastAI Tabular Neural Network. @gradientsky (#742, #748, #826, #839, #842)
  • New model: TextPredictorModel (Multi-modal transformer) (Requires GPU). @sxjscience (#770)
  • New experimental model: TabTransformer (Tabular transformer model (paper)). @Chudbrochil (#723)
  • New experimental model: FastText. @songqiang (#580)
  • View all available models in our documentation: https://auto.gluon.ai/stable/api/autogluon.tabular.models.html
  • New advanced functionality: Extract out-of-fold predictions from a fit TabularPredictor (docs). @innixma (#779)
  • Greatly optimized and expanded upon feature importance calculation functionality. Now predictor.feature_importance() returns confidence bounds on importance values. @innixma (#803)
  • New experimental functionality: predictor.fit_extra() enables the fitting of additional models on top of an already fit TabularPredictor object (docs). @innixma (#768)
  • Per-model HPO support. Now you can specify hyperparameter_tune_kwargs in a model's hyperparameters via 'ag_args': {'hyperparameter_tune_kwargs': hpo_args}. @innixma (#883)
  • Sped up preprocessing runtimes by 100x+ on large (10M+ row) datasets by subsampling data during feature duplicate resolution. @Innixma (#950)
  • Added SHAP notebook tutorials. @jwmueller (#720)
  • Heavily optimized CatBoost inference speed during online-inference. @innixma (#724)
  • KNN models now respect time_limit. @innixma (#845)
  • Added stack ensemble visualization method. @muhyun (#786)
  • Added NLP token prefiltering logic for ngram generation. @sflender (#907)
  • Added initial support for compression of model files to reduce disk usage. @adrienatallah (#940, #944)
  • Numerous bug fixes. @innixma, @jwmueller, @gradientsky (many...)

Text

  • Major API refactor. @sxjscience (#876, #936, #972, #975)
  • Support multi-GPU inference. @sxjscience (#873)
  • Greatly improved user time_limit adherence. @innixma (#877)
  • Fixed bug in model deserialization. @jojo19893 (#708)
  • Numerous bug fixes. @sxjscience (#836, #847, #850, #861, #865, #963, #980)

Vision

  • Major API refactor. @zhreshold (#733, #828, #882, #930, #946)
  • Greatly improved user time_limit adherence. @zhreshold

v0.0.15

3 years ago

Changes

  • Restricted gluoncv install version to <0.9.0 to fix install issues related to namespace collisions (#811).

v0.0.14

3 years ago

Changes

Tabular

  • Complete overhaul of feature generation, major improvements to flexibility, speed, memory usage, and stability @Innixma (#584, #661).
  • Revamped tabular tutorials @jwmueller (#636).
  • Added fastai neural network tabular model (not used by default: requires Torch) @gradientsky (#627).
  • Added LightGBM Extra Trees (LightGBM_XT) model @Innixma (#681).
  • Updated model training priority for multiclass, moved neural networks to train ahead of trees @Innixma (#676).
  • Added .persist_models(), .unpersist_models() methods to TabularPredictor @Innixma (#640).
  • Improved neural network training time @jwmueller (#598).
  • Added example for chunked inference @daveharmon (#634).
  • Improved memory stability on large datasets @Innixma (#644).
  • Reduced maximum memory usage of predictor.leaderboard() @Innixma (#648).
  • Updated LightGBM to v3.x, resulting in ~2x speedup in most cases @Innixma (#662).
  • Updated CatBoost to v0.24.x @Innixma (#664).
  • Updated scikit-learn to <0.24 (from <0.23) @Innixma (#671).
  • Updated pandas version to >=1.0 (from <1.0) @Innixma (#670).
  • Added GPU support for CatBoost @Innixma (#682).
  • Code cleanup @Innixma (#645, #665, #677, #680, #689).
  • Bug Fixes @Innixma, @gradientsky, @jwmueller (#643, #666, #678, #688).

Text

  • Bug Fixes @sxjscience (#651, #653).

General

  • Upgraded to mxnet 1.7 (from 1.6) @sxjscience (#650).
  • Updated all absolute imports to relative imports @Innixma (#637).
  • Documentation Improvements @aaronkl, @rdimaio, @jwmueller (#638, #639, #679).
  • Code cleanup @tirkarthi (#660).
  • Bug Fixes @Innixma, @aaronkl (#674, #686).

v0.0.13

3 years ago

Changes

Tabular

  • Added model distillation @jwmueller (#547).
  • Added FAISS KNN model @brc7 (#557).
  • Refactored Feature Generation (Part 1) @Innixma (#578).
  • Added extra_info argument to predictor.leaderboard @Innixma (#605).
  • Optimized out-of-fold feature memory usage by 50% @Innixma (#588).
  • Added confusion matrix to predictor.evaluate_predictions() output @alan-aipe (#571).
  • Improved output directory generation robustness @songqiang (#620).
  • Improved stability on large datasets by reducing maximum memory usage ratio of RF, XT, and KNN models @Innixma (#630).

Text

  • Added TextPrediction Task @sxjscience (#556).

General

  • Added mxnet 1.7 support @sxjscience (#546).
  • Numerous bug fixes @Innixma, @jwmueller, @sxjscience, @zhreshold, @yongzhengqi, (#559, #568, #577, #590, #592, #597, #600, #604, #621, #625, #629).
  • Documentation improvements @jwmueller, @sxjscience, @songqiang, @Bharat123rox (#554, #561, #585, #609, #628, #631).

v0.0.12

3 years ago

Changes

General

  • Removed gluonnlp from dependencies, gluonnlp can now be installed as an optional dependency to enable the text module (#512).
  • Documentation improvements (#503, #529, #549).

Tabular

  • Added custom model support (#551).
  • Added support for specifying tuning_data argument in TabularPrediction.fit() with test data without the label column to improve data preprocessing and final predictive accuracy on the test data (#551).
  • Fixed major defect added in 0.0.11 which caused the Tabular neural network model to crash during training when categorical features with many possible values were present (#542).
  • Disabled usage of text ngram features in KNN models to dramatically improve inference speed on NLP problems (#531).
  • Added fit_weighted_ensemble() function to TabularPredictor class. Now the user can train additional weighted ensembles post-fit using any subset of the existing trained models (#550).
  • Added AG_args_fit argument to enable advanced model training control such as per-model time limit and memory usage (#531).
  • Added excluded_model_types argument to TabularPrediction.fit() to enable simplified removal of model types without editing the hyperparameters argument (#543).
  • Added version check when loading a predictor, will log a warning if the predictor was trained on a different version of AutoGluon (#536).
  • Improved support for GPU on CatBoost (#527).
  • Moved CatBoost to lazy import to enable running Tabular without installing CatBoost (#534).
  • Added support for training models with no features, in order to get a best guess prediction based only on the average label value (#537).
  • Major refactor of internal feature_types_metadata object and AutoFeatureGenerator (#548).
  • Major refactor of internal variable names (#551).

Core

  • Minor scheduler cleanup (#523, #540).

v0.0.11

3 years ago

Changes

General

  • Added bayesopt and bayesopt_hyperband schedulers (#501, #507)
  • Updated minimum sklearn version from 0.20 to 0.22 (#521)

Tabular

  • Optimized memory utilization for text features (#513)
  • Optimized memory utilization for tabular neural network (#518)
  • Optimized training speed of LightGBM by ~100%-200% on most datasets (#511)
  • Optimized training speed of CatBoost by ~100% on regression datasets (#514)
  • Added return_original_features argument to transform_features, plus bug fixes (#517)
  • Improved tabular neural network training stability on log loss metric (#481)
  • Numerous fixes and code cleanup (#510, #502, #505, #516)

v0.0.10

3 years ago

Changes

General

  • Removed unnecessary thread workers upon importing autogluon (#494, #495)
  • Suppressed excessive logging of distributed thread workers (#496)
  • Capped gluoncv version to 0.x (#484)
  • Unified scheduler creation (#470)

Tabular

  • Refactored hyperparameter argument, added options for different models per stack layer (#489)
  • Optimized CatBoost training time when many features are present (#489)
  • Enabled automatic type setting to dtypes during inference (#463)
  • Added feature importance for original features (#479)
  • Fixed root_mean_squared_error metric (#464)
  • Fixed pac_score metric (#483)
  • Various Fixes (#465, #472, #474, #489)

v0.0.9

4 years ago

Changes

General

  • Limited sklearn version to <0.23 to resolve import failure in skopt. (#460)
  • Limited catboost version to <0.24. (#460)

v0.0.8

4 years ago

Changes

General

  • Fixed broken PyPi build failing to import in 0.0.7. (#457)