Sl3 Versions Save

💪 🤔 Modern Super Learning with Machine Learning Pipelines

v1.4.4

2 years ago

v1.4.4 ("Blitzen") is a major release, featuring numerous updates and bugfixes (totaling 400+ commits spread across ~8 months), including

Updates to Lrnr_nnls to support binary outcomes, including support for convexity of the resultant model fit and warnings on prediction quality.
Changes to Lrnr_cv_selector to support improved computation of the CV-risk, averaging the risk strictly across validation/holdout sets.
Update Lrnr_sl by adding a new private slot .cv_risk to store the risk estimates, using this to avoid unnecessary re-computation in the print method (the .cv_risk slot is populated on the first print call, and only ever re-printed thereafter).
Fix Lrnr_screener_importance's pairing of (a) covariates returned by the importance function with (b) covariates as they are defined in the task. This issue only arose when discrete covariates were automatically one-hot encoded upon task initiation (i.e., when colnames(task$X) != task$nodes$covariates).
Enhanced functionality in sl3 task's add_interactions method to support interactions that involve factors. This method is most commonly used by Lrnr_define_interactions, which is intended for use with another learner (e.g., Lrnr_glmnet or Lrnr_glm) in a Pipeline.
Modified Lrnr_gam formula (if not specified by user) to not use mgcv's default k=10 degrees of freedom for each smooth s term when there are less than k=10 degrees of freedom. This bypasses an mgcv::gam error, and tends to be relevant only for small n.
Incorporated min_screen argument Lrnr_screener_coefs, which tries to ensure that at least min_screen number of covariates are selected. If this argument is specified and the learner argument in Lrnr_screener_coefs is a Lrnr_glmnet, then lambda is increased until min_screen number of covariates are selected and a warning is produced. If min_screen is specified and the learner argument in Lrnr_screener_coefs is not a Lrnr_glmnet then it will error.
Added formula parameter and process_formula function to the base learner, Lrnr_base, whose methods carry over to all other learners. When a formula is supplied as a learner parameter, the process_formula function constructs a design matrix by supplying the formulatomodel.matrix. This implementation allows formulato be supplied to all learners, even those without nativeformulasupport. Theformula should be an object of class "formula`", or a character string that can be coerced to that class.
Added factory function for performance-based risks for binary outcomes with ROCR performance measures custom_ROCR_risk. Supports cutoff-dependent and scalar ROCR performance measures. The risk is defined as 1 - performance, and is transformed back to the performance measure in cv_risk and importance functions. This change prompted the revision of argument name loss_fun and loss_function to eval_fun and eval_function, respectively, since the evaluation of predictions relative to the observations can be either a risk or a loss function. This argument name change impacted the following: Lrnr_solnp, Lrnr_optim, Lrnr_cv_selector, cv_risk, importance, and CV_Lrnr_sl.
Incorporated stratified cross-validation when folds are not supplied to the sl3_Task and the outcome is a discrete (i.e., binary or categorical) variable.
Added to the importance method the option to evaluate importance over covariate_groups, by removing/permuting all covariates in the same group together.
Added Lrnr_ga as another metalearner.

See the NEWS file for complete details.

v1.4.2

3 years ago

v1.4.2 of sl3 is a major release, featuring numerous updates to core functionality and improvements to available learners, including

Updates to variable importance functionality, including calculation of risk ratio and risk differences under covariate deletion or permutation.
Addition of a importance_plot to summarize variable importance findings.
Additions of new methods reparameterize and retrain to Lrnr_base, which allows modification of the covariate set while training on a conserved task and prediction on a new task using previously trained learners, respectively.
Updates to variable importance functionality, including use of risk ratios.
Change Lrnr_hal9001 and Lrnr_glmnet to respect observation-level IDs.
Removal of Remotes and deprecation of Lrnr_rfcde and Lrnr_condensier.

v1.3.7

4 years ago

v1.3.7 of sl3 is a major release, which features updated functionality:

sampling methods for Monte Carlo integration and related procedures
a metalearner for the cross-validation selector (discrete super learner)
a learner for bounding, including support for bounded losses
resolution of a number of older issues (#264)
relaxation of checks inside Stack objects for time series learners
addition of a learner property table to README.Rmd
maintenance and documentation updates

v1.3.5

4 years ago

v1.3.5 of sl3 is a minor release, featuring several important updates to the core software:

New screening methods and convex combination in Lrnr_nnls by @rachaelvphillips
Overhaul of data preprocessing by @Zyx0Wu
Bug fixes by @jeremyrcoyle, including covariate subsetting and better handling of NAs
Package and documentation cleanup, continuous integration and testing fixes, reproducibility updates (including new versioning and DOI minting) by @nhejazi

v1.3.0

4 years ago

v1.3.0 of sl3 represents a set of major updates to the core software. An inexhaustive list of the included changes include

fixing incorrect handling of missingness in the automatic imputation procedure
addition of new standard learners, including from the gam and caret packages
addition of custom learners for conditional density estimation, including semiparametric methods based on conditional mean and conditional mean/variance estimation as well as generalized functionality for density estimation via a pooled hazards approach

v1.2.0

4 years ago

v1.2.0 of sl3 represents a set of major updates to the core software. An inexhaustive list of the included changes include

default metalearners based on task outcome types
handling of imputation internally in task objects
addition of new learners, including from the gbm, earth, polspline packages
fixing errors in existing learners (e.g., subtle parallelization in xgboost and ranger)
support for multivariate outcomes and (default) revere-style cross-validation
support for cross-validated super learner and variable importance

v1.1.0

5 years ago

v1.1.0 of the sl3 R package marks a full-featured and stable release of the project. Numerous learners are included and many bugs have been fixed relative to earlier versions (esp v1.0.0) of the software.