Superlearner Guide Save

SuperLearner guide: fitting models, ensembling, prediction, hyperparameters, parallelization, timing, feature selection, etc.

Project README

SuperLearner Guide

A guide to using SuperLearner for prediction. This is now included as a vignette in the SuperLearner package.

Note: this tutorial is a bit out of date; some supplemental methods are now in my ck37r package.

SuperLearner Intro

  • Installing
  • Background
  • Create dataset
  • Review available models
  • Fit single models
  • Fit ensemble
  • Predict on new dataset
  • Customize a model setting
  • External cross-validation
  • Test multiple hyperparameter settings
  • Parallelize across CPUs
  • Distribution of ensemble weights
  • Feature selection (screening)
  • Optimize for AUC
  • XGBoost hyperparameter exploration


(To be created)

  • create.Learner() custom environments
  • SL.caret wrapper
  • Custom learner wrapper
  • Custom screener
  • Library analysis - cumulative
  • Library analysis - individual algorithms
  • Recombine SuperLearner


(To be created)

  • Parallelize across computers (SLURM)
  • Repeated cross-validation
  • Data-adaptive V-selection for cross-validation
  • Multi-level meta-learning



Campus Groups:

Courses at Berkeley:

  • Stat 154 - Statistical Learning
  • CS 189 / CS 289A - Machine Learning
  • PH 252D - Causal Inference
  • PH 295 - Big Data
  • PH 295 - Targeted Learning for Biomedical Big Data
  • INFO - TBD

Also many Coursera offerings and other online classes.


Erin LeDell, Maya L. Petersen & Mark J. van der Laan, "Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates." (Electronic Journal of Statistics)

Polley EC, van der Laan MJ (2010) Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 226.

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

van der Laan, M. J., & Rose, S. (2011). Targeted learning: causal inference for observational and experimental data. Springer Science & Business Media.

Open Source Agenda is not affiliated with "Superlearner Guide" Project. README Source: ck37/superlearner-guide
Open Issues
Last Commit
4 years ago

Open Source Agenda Badge

Open Source Agenda Rating