Sl3 Versions Save

💪 🤔 Modern Super Learning with Machine Learning Pipelines

v1.4.4

2 years ago

v1.4.4 ("Blitzen") is a major release, featuring numerous updates and bugfixes (totaling 400+ commits spread across ~8 months), including

  • Updates to Lrnr_nnls to support binary outcomes, including support for convexity of the resultant model fit and warnings on prediction quality.
  • Changes to Lrnr_cv_selector to support improved computation of the CV-risk, averaging the risk strictly across validation/holdout sets.
  • Update Lrnr_sl by adding a new private slot .cv_risk to store the risk estimates, using this to avoid unnecessary re-computation in the print method (the .cv_risk slot is populated on the first print call, and only ever re-printed thereafter).
  • Fix Lrnr_screener_importance's pairing of (a) covariates returned by the importance function with (b) covariates as they are defined in the task. This issue only arose when discrete covariates were automatically one-hot encoded upon task initiation (i.e., when colnames(task$X) != task$nodes$covariates).
  • Enhanced functionality in sl3 task's add_interactions method to support interactions that involve factors. This method is most commonly used by Lrnr_define_interactions, which is intended for use with another learner (e.g., Lrnr_glmnet or Lrnr_glm) in a Pipeline.
  • Modified Lrnr_gam formula (if not specified by user) to not use mgcv's default k=10 degrees of freedom for each smooth s term when there are less than k=10 degrees of freedom. This bypasses an mgcv::gam error, and tends to be relevant only for small n.
  • Incorporated min_screen argument Lrnr_screener_coefs, which tries to ensure that at least min_screen number of covariates are selected. If this argument is specified and the learner argument in Lrnr_screener_coefs is a Lrnr_glmnet, then lambda is increased until min_screen number of covariates are selected and a warning is produced. If min_screen is specified and the learner argument in Lrnr_screener_coefs is not a Lrnr_glmnet then it will error.
  • Added formula parameter and process_formula function to the base learner, Lrnr_base, whose methods carry over to all other learners. When a formula is supplied as a learner parameter, the process_formula function constructs a design matrix by supplying the formulatomodel.matrix. This implementation allows formulato be supplied to all learners, even those without nativeformulasupport. Theformula should be an object of class "formula`", or a character string that can be coerced to that class.
  • Added factory function for performance-based risks for binary outcomes with ROCR performance measures custom_ROCR_risk. Supports cutoff-dependent and scalar ROCR performance measures. The risk is defined as 1 - performance, and is transformed back to the performance measure in cv_risk and importance functions. This change prompted the revision of argument name loss_fun and loss_function to eval_fun and eval_function, respectively, since the evaluation of predictions relative to the observations can be either a risk or a loss function. This argument name change impacted the following: Lrnr_solnp, Lrnr_optim, Lrnr_cv_selector, cv_risk, importance, and CV_Lrnr_sl.
  • Incorporated stratified cross-validation when folds are not supplied to the sl3_Task and the outcome is a discrete (i.e., binary or categorical) variable.
  • Added to the importance method the option to evaluate importance over covariate_groups, by removing/permuting all covariates in the same group together.
  • Added Lrnr_ga as another metalearner.

See the NEWS file for complete details.

v1.4.2

3 years ago

v1.4.2 of sl3 is a major release, featuring numerous updates to core functionality and improvements to available learners, including

  • Updates to variable importance functionality, including calculation of risk ratio and risk differences under covariate deletion or permutation.
  • Addition of a importance_plot to summarize variable importance findings.
  • Additions of new methods reparameterize and retrain to Lrnr_base, which allows modification of the covariate set while training on a conserved task and prediction on a new task using previously trained learners, respectively.
  • Updates to variable importance functionality, including use of risk ratios.
  • Change Lrnr_hal9001 and Lrnr_glmnet to respect observation-level IDs.
  • Removal of Remotes and deprecation of Lrnr_rfcde and Lrnr_condensier.

v1.3.7

4 years ago

v1.3.7 of sl3 is a major release, which features updated functionality:

  • sampling methods for Monte Carlo integration and related procedures
  • a metalearner for the cross-validation selector (discrete super learner)
  • a learner for bounding, including support for bounded losses
  • resolution of a number of older issues (#264)
  • relaxation of checks inside Stack objects for time series learners
  • addition of a learner property table to README.Rmd
  • maintenance and documentation updates

v1.3.5

4 years ago

v1.3.5 of sl3 is a minor release, featuring several important updates to the core software:

  • New screening methods and convex combination in Lrnr_nnls by @rachaelvphillips
  • Overhaul of data preprocessing by @Zyx0Wu
  • Bug fixes by @jeremyrcoyle, including covariate subsetting and better handling of NAs
  • Package and documentation cleanup, continuous integration and testing fixes, reproducibility updates (including new versioning and DOI minting) by @nhejazi

v1.3.0

4 years ago

v1.3.0 of sl3 represents a set of major updates to the core software. An inexhaustive list of the included changes include

  • fixing incorrect handling of missingness in the automatic imputation procedure
  • addition of new standard learners, including from the gam and caret packages
  • addition of custom learners for conditional density estimation, including semiparametric methods based on conditional mean and conditional mean/variance estimation as well as generalized functionality for density estimation via a pooled hazards approach

v1.2.0

4 years ago

v1.2.0 of sl3 represents a set of major updates to the core software. An inexhaustive list of the included changes include

  • default metalearners based on task outcome types
  • handling of imputation internally in task objects
  • addition of new learners, including from the gbm, earth, polspline packages
  • fixing errors in existing learners (e.g., subtle parallelization in xgboost and ranger)
  • support for multivariate outcomes and (default) revere-style cross-validation
  • support for cross-validated super learner and variable importance

v1.1.0

5 years ago

v1.1.0 of the sl3 R package marks a full-featured and stable release of the project. Numerous learners are included and many bugs have been fixed relative to earlier versions (esp v1.0.0) of the software.