Stacked Ensembles use a process called stacking (also known as Super Learning or Stacked Regression) to combine multiple base learners. Unlike bagging (DRF) and boosting (GBM), stacking ensembles strong, diverse learners together. The goal is to find the optimal weighted combination of base learners by training a second-level metalearner on their cross-validated predictions. H2O-3 supports regression, binary classification, and multiclass classification with Stacked Ensembles. MOJO Support: Stacked Ensembles support importing and exporting MOJOs.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/h2oai/h2o-3/llms.txt
Use this file to discover all available pages before exploring further.
How Stacking Works
Set up base learners
Train a diverse set of cross-validated base models (e.g., GBM, XGBoost, DRF, GLM, Deep Learning). All base models must use the same number of cross-validation folds and have
keep_cross_validation_predictions=True.Build the level-one data
The cross-validated out-of-fold predictions from each base model are assembled into an
N × L matrix (N rows, L base models). This “level-one” data represents what each base model predicts for each training row when it wasn’t used to train that fold.Train the metalearner
A metalearner algorithm (by default, a non-negative GLM) is trained on the level-one data against the true response. The metalearner learns the optimal weights for each base model.
Building Base Learners
Before training a Stacked Ensemble, you need cross-validated base models. The requirements are:- All base models must use the same number of folds (
nfolds >= 2) or the samefold_column. - All base models must have
keep_cross_validation_predictions=True. - All base models must be trained on the same
training_frame.
Python
Training the Stacked Ensemble
Python
R
Metalearner Options
Algorithm used to combine base model predictions:
"AUTO"(default) — non-negative GLM with standardization off; useslambda_searchif a validation frame is present"glm"— GLM with default parameters"gbm"— GBM with default parameters"drf"— Distributed Random Forest"deeplearning"— Deep Learning"naivebayes"— Naïve Bayes"xgboost"— XGBoost (if available)
Number of cross-validation folds for the metalearner itself.
0 disables metalearner CV.If provided, triggers blending mode: the base model predictions on this holdout frame are used as metalearner training data instead of cross-validation predictions. Faster than stacking but requires a separate blending frame.
Retain the level-one data frame (base model CV predictions assembled into a matrix) for inspection.
List of trained H2O model objects or model IDs. All models must be cross-validated with the same folds and have
keep_cross_validation_predictions=True.Blending Mode
Blending (holdout stacking) is an alternative to cross-validation-based stacking. You provide a separateblending_frame that the base models score on; those predictions become the metalearner training data.
Python