Adaptive and automatic gradient boosting computations.
Adaptive and automatic gradient tree boosting computations
aGTBoost is a lightning fast gradient boosting library designed to avoid manual tuning and cross-validation by utilizing an information theoretic approach. This makes the algorithm adaptive to the dataset at hand; it is completely automatic, and with minimal worries of overfitting. Consequently, the speed-ups relative to state-of-the-art implementations are in the thousands while mathematical and technical knowledge required on the user are minimized.
Note: Currently for academic purposes: Implementing and testing new innovations w.r.t. information theoretic choices of GTB-complexity. See below for to-do research list.
R: Finally on CRAN! Install the stable version with
install.packages("agtboost")
or install the development version from GitHub
devtools::install_github("Blunde1/agtboost/R-package")
Users experiencing errors after warnings during installlation, may be helped by the following command prior to installation:
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true")
agtboost
essentially has two functions, a train function gbt.train
and a predict function predict
.
From the code below it should be clear how to train an aGTBoost model using a design matrix x
and a response vector y
, write ?gbt.train
in the console for detailed documentation.
library(agtboost)
# -- Load data --
data(caravan.train, package = "agtboost")
data(caravan.test, package = "agtboost")
train <- caravan.train
test <- caravan.test
# -- Model building --
mod <- gbt.train(train$y, train$x, loss_function = "logloss", verbose=10)
# -- Predictions --
prob <- predict(mod, test$x) # Score after logistic transformation: Probabilities
agtboost
also contain functions for model inspection and validation.
gbt.importance
generates a typical feature importance plot.
Techniques like inserting noise-features are redundant due to computations w.r.t. approximate generalization (test) loss.gbt.convergence
computes the loss over the path of boosting iterations. Check visually for convergence on test loss.gbt.ksval
transforms observations to standard uniformly distributed random variables, if the model is specified
correctly. Perform a formal Kolmogorov-Smirnov test and plots transformed observations for visual inspection.# -- Feature importance --
gbt.importance(feature_names=colnames(caravan.train$x), object=mod)
# -- Model validation --
gbt.ksval(object=mod, y=caravan.test$y, x=caravan.test$x)
The functions gbt.ksval
and gbt.importance
create the following plots:
Furthermore, an aGTBoost model is (see example code)
Any help on the following subjects are especially welcome:
Please note that the priority is to work on and push the above mentioned scheduled updates. Patience is a virtue. :)