An experimental Python package that reimplements AutoGBT using LightGBM and Optuna.
This is an experimental Python package that reimplements AutoGBT using LightGBM and Optuna. AutoGBT is an automatically tuned machine learning classifier which won the first prize at NeurIPS'18 AutoML Challenge. AutoGBT has the following features:
This implementation has the following differences from original AutoGBT:
$ pip install git+https://github.com/pfnet-research/autogbt-alt.git
or
$ pip install git+ssh://[email protected]/pfnet-research/autogbt-alt.git
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from autogbt import AutoGBTClassifier
X, y = load_breast_cancer(return_X_y=True)
train_X, valid_X, train_y, valid_y = train_test_split(X, y, test_size=0.1)
model = AutoGBTClassifier()
model.fit(train_X, train_y)
print('valid AUC: %.3f' % (roc_auc_score(valid_y, model.predict(valid_X))))
print('CV AUC: %.3f' % (model.best_score))
from autogbt import Preprocessor
preprocessor = Preprocessor(train_frac=0.5, test_frac=0.5)
train_X, valid_X, train_y = preprocessor.transform(train_X, valid_X, train_y)
from autogbt import TrainDataSampler
sampler = TrainDataSampler(train_frac=0.5, valid_frac=0.5)
model = AutoGBTClassifier(sampler=sampler)
model.fit(train_X, train_y)
model.predict(test_X)
Please see benchmark
directory for the details.
The default values are used for all hyperparameters of AutoGBT, XGBoost and LightGBM.
model | duration[s] | CV AUC |
---|---|---|
AutoGBT | 6515.254±340.231 | 0.900±0.001 |
Xgboost | 78.561±7.265 | 0.872±0.000 |
LightGBM | 34.000±2.285 | 0.891±0.000 |
model | duration[s] | CV AUC |
---|---|---|
AutoGBT | 359.834±29.188 | 0.832±0.002 |
Xgboost | 2.558±0.661 | 0.749±0.002 |
LightGBM | 1.789±0.165 | 0.834±0.002 |
model | duration[s] | CV AUC |
---|---|---|
AutoGBT | 20322.601±676.702 | 0.744±0.000 |
Xgboost | OoM | OoM |
LightGBM | OoM | OoM |
model | duration[s] | CV AUC |
---|---|---|
AutoGBT | 372.090±32.857 | 0.925±0.001 |
Xgboost | 2.683±0.204 | 0.912±0.001 |
LightGBM | 2.406±0.236 | 0.927±0.001 |
Performance on various train_frac
and n_trials
parameters
$ ./test.sh
Jobin Wilson and Amit Kumar Meher and Bivin Vinodkumar Bindu and Manoj Sharma and Vishakha Pareek and Santanu Chaudhury and Brejesh Lall, AutoGBT: Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High Cardinality Data Streams under Concept-Drift, 2018, https://github.com/flytxtds/AutoGBT.