GPBoost Versions Save

Combining tree-boosting with Gaussian process and mixed effects models

v1.4.0

1 month ago
  • Support space-time (‘matern_space_time’) and anisotropic ARD (‘matern_ard’, ‘gaussian_ard’) covariance functions
  • support ‘negative_binomial’ likelihood
  • support FITC aka modified predictive process approximation (‘fitc’) and full scale approximation with tapering (‘full_scale_tapering’) with ‘cholesky’ decomposition and ‘iterative’ methods
  • add optimizer_cov option 'lbfgs', and make this the default for (generalized) linear effects models
  • faster prediction for multiple grouped random effects and non-Gaussian likelihoods
  • allow for duplicate locations / coordinates for Vecchia approximation for non-Gaussian likelihoods
  • support vecchia approximation for space-time and ARD covariance functions with correlation-based neighbor selection
  • support offset in GLMMs
  • add safeguard against too large step sizes for linear regression coefficients
  • change default initial values for (i) (marginal) variance and error variance to var(y)/2 for Gaussian likelihoods and (ii) range parameters such that the effective range is half the average distance
  • add backtracking line search for mode finding in Laplace approximation
  • add option ‘reuse_learning_rates_gp_model’ for GPBoost algorithm -> faster learning
  • add option ‘line_search_step_length’ for GPBoost algorithm. This corresponds to the optimal choice of boosting learning rate as in e.g. Friedman (2001)
  • support optimzer_coef = ‘wls’ when optimizer_cov = ‘lbfgs’ for Gaussian likelihood, make this the default

v1.2.5

7 months ago
  • support iterative methods for Vecchia-Laplace approximation (non-Gaussian data and gp_approx=”vecchia”)
  • faster model construction and prediction for compactly supported covariance functions
  • add metric 'test_neg_log_likelihood'
  • change handling of 'objective' parameter for GPBoost algorithm: only ‘likelihood’ in ‘GPModel()’ needs to be set
  • change API for parameters ‘vecchia_pred_type’ and ‘num_neighbors_pred’

v1.0.1

1 year ago
  • faster gradient calculation for

    1. Multiple / multilevel grouped random effects for non-Gaussian likelihoods
    2. GPs with Vecchia approximation for non-Gaussian likelihoods
    3. GPs with compactly supported covariance functions / tapering
  • enable estimation of shape parameter in gamma likelihood

  • predict_training_data_random_effects: enable for Vecchia approximation and enable calculation of variances

  • change API for Vecchia approximation and tapering

  • correction in nearest neighbor search for Vecchia approximation

  • show GPModel parameters on original and not transformed scale when trace = true

  • change initial intercept for bernoulli_probit, gamma, and poisson likelihood

  • change default value for ‘delta_rel_conv’ to 1e-8 for nelder_mead

  • avoid unrealistically large learning rates for gradient descent

v0.8.0

1 year ago
  • cap too large gradient descent steps on log-scale for covariance parameters, GLMMs: reset small learning rates for covariance parameters and regression parameters if the other parameters change
  • add gaussian_neg_log_likelihood as validation metric
  • add function ‘get_nested_categories‘ for nested grouped random effects
  • prediction: remove nugget variance from predictive (co)variances when predict_response = false for Gaussian likelihoods
  • set default value for predict_response to true in prediction function of GPModel
  • NA’s and Inf’s are not allowed in label
  • correct prediction if Vecchia approximation for non-Gaussian likelihoods

v0.7.7

1 year ago
  • Reduce memory usage for Vecchia approximation
  • [R-package] add function for creating interaction partial dependence plots
  • Add function ‘predict_training_data_random_effects’ for predicting (=‘estimating’) training data random effects
  • [R-package][python-package] predict function: rename ‘raw_score’ argument to ‘pred_latent’ and unify handling of Gaussian and non-Gaussian data
  • (G)LMMs: better initialization of intercept, change internal scaling of covariates, change default value of ‘lr_coef’ to 0.1
  • Add ‘adam’ as optimizer option
  • allow for grouped random coefficients without random intercept effects
  • [R-package][python-package] nicer summary function

v0.7.1

2 years ago
  • make predictions faster and more memory efficient when having multiple grouped random effects
  • set “nelder_mead” as automatic fallback option if problems in optimization occur
  • (generalized) linear mixed effects models: scale covariate data for linear predictor internally for optimization using gradient descent
  • add “bfgs” as optimizer option

v0.6.7

2 years ago
  • add Grabit model / Tobit objective function
  • support calculation of approximate standard deviations of fixed effects coefficients in GLMMs
  • [R package] added function for creating partial dependence plots (gpb.plot.partial.dependence)
  • [R package] use R’s internal .Call function, correct function registration, use R’s internal error function, use R standard routines to access data in C++, move more finalizer logic into C++ side, fix PROTECT/UNPROTECT issues, limit exported symbols in DLL,
  • [Python package] Fix bug in scikit-learn wrapper for classification
  • change in initialization and checking of convergence criterion for mode finding algorithm for Laplace approximation for non Gaussian data

v0.6.0

3 years ago
  • add support for Wendland covariance function and covariance tapering
  • add Nelder-Mead as covariance parameter optimizer option
  • change calculation of gradient for GPBoost algorithm and use permutations for Cholesky factors for non-Gaussian data
  • use permutations for Cholesky factors for Gaussian data when having sparse matrices
  • make “gradient_descent” the default optimizer option also for Gaussian data

v0.5.0

3 years ago
  • add function in R and Python packages that allows for choosing tuning parameters using deterministic or random grid search
  • faster training and prediction for grouped random effects models for non-Gaussian data when there is only one grouping variable
  • faster training and prediction for Gaussian process models for non-Gaussian data when there are duplicate locations
  • faster prediction for grouped random effects models for Gaussian data when there is only one grouping variable
  • support pandas DataFrame and Series in Python package
  • fix bug in initialization of score for the GPBoost algorithm for non-Gaussian data
  • add lightweight option for saving booster models with gp_models by not saving the raw data (this is the new default)
  • update eigen to newest version (commit b271110788827f77192d38acac536eb6fb617a0d)

v0.4.0

3 years ago
  • update LightGBM part to version 3.1.1.99 (git commit 42d1633aebe124821cff42c728a42551db715168)
  • add support for scikit-learn wrapper interface for GPBoost
  • change initialization of score (=tree ensemble) for non-Gaussian data for GPBoost algorithm
  • add support for saving and loading models from file in R and Python packages