GPBoost Versions Save

Combining tree-boosting with Gaussian process and mixed effects models

1 month ago

Support space-time (‘matern_space_time’) and anisotropic ARD (‘matern_ard’, ‘gaussian_ard’) covariance functions
support ‘negative_binomial’ likelihood
support FITC aka modified predictive process approximation (‘fitc’) and full scale approximation with tapering (‘full_scale_tapering’) with ‘cholesky’ decomposition and ‘iterative’ methods
add optimizer_cov option 'lbfgs', and make this the default for (generalized) linear effects models
faster prediction for multiple grouped random effects and non-Gaussian likelihoods
allow for duplicate locations / coordinates for Vecchia approximation for non-Gaussian likelihoods
support vecchia approximation for space-time and ARD covariance functions with correlation-based neighbor selection
support offset in GLMMs
add safeguard against too large step sizes for linear regression coefficients
change default initial values for (i) (marginal) variance and error variance to var(y)/2 for Gaussian likelihoods and (ii) range parameters such that the effective range is half the average distance
add backtracking line search for mode finding in Laplace approximation
add option ‘reuse_learning_rates_gp_model’ for GPBoost algorithm -> faster learning
add option ‘line_search_step_length’ for GPBoost algorithm. This corresponds to the optimal choice of boosting learning rate as in e.g. Friedman (2001)
support optimzer_coef = ‘wls’ when optimizer_cov = ‘lbfgs’ for Gaussian likelihood, make this the default

7 months ago

support iterative methods for Vecchia-Laplace approximation (non-Gaussian data and gp_approx=”vecchia”)
faster model construction and prediction for compactly supported covariance functions
add metric 'test_neg_log_likelihood'
change handling of 'objective' parameter for GPBoost algorithm: only ‘likelihood’ in ‘GPModel()’ needs to be set
change API for parameters ‘vecchia_pred_type’ and ‘num_neighbors_pred’

1 year ago

faster gradient calculation for
1. Multiple / multilevel grouped random effects for non-Gaussian likelihoods
2. GPs with Vecchia approximation for non-Gaussian likelihoods
3. GPs with compactly supported covariance functions / tapering
enable estimation of shape parameter in gamma likelihood
predict_training_data_random_effects: enable for Vecchia approximation and enable calculation of variances
change API for Vecchia approximation and tapering
correction in nearest neighbor search for Vecchia approximation
show GPModel parameters on original and not transformed scale when trace = true
change initial intercept for bernoulli_probit, gamma, and poisson likelihood
change default value for ‘delta_rel_conv’ to 1e-8 for nelder_mead
avoid unrealistically large learning rates for gradient descent

1 year ago

cap too large gradient descent steps on log-scale for covariance parameters, GLMMs: reset small learning rates for covariance parameters and regression parameters if the other parameters change
add gaussian_neg_log_likelihood as validation metric
add function ‘get_nested_categories‘ for nested grouped random effects
prediction: remove nugget variance from predictive (co)variances when predict_response = false for Gaussian likelihoods
set default value for predict_response to true in prediction function of GPModel
NA’s and Inf’s are not allowed in label
correct prediction if Vecchia approximation for non-Gaussian likelihoods

1 year ago

Reduce memory usage for Vecchia approximation
[R-package] add function for creating interaction partial dependence plots
Add function ‘predict_training_data_random_effects’ for predicting (=‘estimating’) training data random effects
[R-package][python-package] predict function: rename ‘raw_score’ argument to ‘pred_latent’ and unify handling of Gaussian and non-Gaussian data
(G)LMMs: better initialization of intercept, change internal scaling of covariates, change default value of ‘lr_coef’ to 0.1
Add ‘adam’ as optimizer option
allow for grouped random coefficients without random intercept effects
[R-package][python-package] nicer summary function

2 years ago

make predictions faster and more memory efficient when having multiple grouped random effects
set “nelder_mead” as automatic fallback option if problems in optimization occur
(generalized) linear mixed effects models: scale covariate data for linear predictor internally for optimization using gradient descent
add “bfgs” as optimizer option

2 years ago

add Grabit model / Tobit objective function
support calculation of approximate standard deviations of fixed effects coefficients in GLMMs
[R package] added function for creating partial dependence plots (gpb.plot.partial.dependence)
[R package] use R’s internal .Call function, correct function registration, use R’s internal error function, use R standard routines to access data in C++, move more finalizer logic into C++ side, fix PROTECT/UNPROTECT issues, limit exported symbols in DLL,
[Python package] Fix bug in scikit-learn wrapper for classification
change in initialization and checking of convergence criterion for mode finding algorithm for Laplace approximation for non Gaussian data

3 years ago

add support for Wendland covariance function and covariance tapering
add Nelder-Mead as covariance parameter optimizer option
change calculation of gradient for GPBoost algorithm and use permutations for Cholesky factors for non-Gaussian data
use permutations for Cholesky factors for Gaussian data when having sparse matrices
make “gradient_descent” the default optimizer option also for Gaussian data

3 years ago

add function in R and Python packages that allows for choosing tuning parameters using deterministic or random grid search
faster training and prediction for grouped random effects models for non-Gaussian data when there is only one grouping variable
faster training and prediction for Gaussian process models for non-Gaussian data when there are duplicate locations
faster prediction for grouped random effects models for Gaussian data when there is only one grouping variable
support pandas DataFrame and Series in Python package
fix bug in initialization of score for the GPBoost algorithm for non-Gaussian data
add lightweight option for saving booster models with gp_models by not saving the raw data (this is the new default)
update eigen to newest version (commit b271110788827f77192d38acac536eb6fb617a0d)

3 years ago

update LightGBM part to version 3.1.1.99 (git commit 42d1633aebe124821cff42c728a42551db715168)
add support for scikit-learn wrapper interface for GPBoost
change initialization of score (=tree ensemble) for non-Gaussian data for GPBoost algorithm
add support for saving and loading models from file in R and Python packages