H2O-3 is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform. It lets you build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/h2oai/h2o-3/llms.txt
Use this file to discover all available pages before exploring further.
H2O-3 is licensed under the Apache License, Version 2.0. Source code, issue tracking, and community discussion are available on GitHub.
What is H2O-3?
H2O-3 is an in-memory platform for distributed, scalable machine learning. Its core code is written in Java. A distributed key-value store is used to access and reference data, models, and objects across all nodes and machines. Algorithms are implemented on top of H2O-3’s distributed map-reduce framework and use the Java fork/join framework for multi-threading. Data is read in parallel, distributed across the cluster, stored in-memory in a columnar compressed format. H2O’s data parser has built-in intelligence to guess the schema of incoming datasets and supports data ingest from multiple sources in various formats.Supported algorithms
H2O-3 includes production-ready implementations of the following algorithms:- AdaBoost — boosting ensemble for classification
- AutoML — fully automatic model training and selection
- Cox Proportional Hazards (CoxPH) — survival analysis
- Decision Tree — single decision tree learner
- Deep Learning — multi-layer neural networks
- Distributed Random Forest (DRF) — tree-based ensemble
- Distributed Uplift Random Forest — treatment effect estimation
- Extended Isolation Forest — anomaly detection
- Generalized Additive Models (GAM) — flexible semi-parametric models
- Generalized Linear Model (GLM) — linear, logistic, and Poisson regression
- Generalized Low Rank Models (GLRM) — matrix factorization
- Gradient Boosting Machine (GBM) — tree boosting for regression and classification
- Isolation Forest — anomaly detection via random partitioning
- Isotonic Regression — monotone regression
- K-Means Clustering — unsupervised partitioning
- Naïve Bayes Classifier — probabilistic classification
- Principal Component Analysis (PCA) — dimensionality reduction
- RuleFit — interpretable rule-based model
- Stacked Ensembles — meta-learner combining base models
- Support Vector Machine (PSVM) — kernel-based classification
- Target Encoding — categorical feature preprocessing
- Word2Vec — word embedding from text
- XGBoost — optimized gradient boosting
Multi-language support
H2O-3 exposes a consistent API across multiple languages and interfaces. All client libraries communicate with the H2O-3 backend through the REST API.Python
Install via
pip install h2o. Full-featured client with estimators, frames, and AutoML.R
Install via
install.packages("h2o"). Mirrors the Python API with idiomatic R conventions.Flow UI
Browser-based notebook interface available at
http://localhost:54321 when the cluster is running.REST API
JSON over HTTP. All capabilities of H2O-3 are accessible from any language or tool.
Java and Scala users can access H2O-3 through the REST API or by embedding H2O-3 as a Maven artifact in their projects.
Architecture overview
Cluster model
An H2O-3 cluster is a set of JVM processes (nodes) that work together as a single distributed system. Nodes communicate peer-to-peer — there is no designated master node for data distribution.- Cluster formation: New H2O-3 nodes join during launch using multicast or flatfile-based discovery. Once a job starts, the cluster locks and prevents new members from joining.
- In-memory storage: Data is stored across all nodes in a columnar compressed format. Each column (
Vec) is split into contiguous subsets (Chunk) distributed across the cluster. - Distributed computation: MRTask (Map/Reduce) moves computation to the data rather than moving data to the computation. Results reduce up a tree back to the initiating node.
Distributed key-value store (DKV)
Every object — frames, models, chunks — has a home node determined by consistent hashing of its key. The DKV is used to locate and access all distributed objects:REST API
H2O-3’s REST API allows access to all capabilities from an external program or script through JSON over HTTP. The REST API is used by:- The Flow web UI
- The R binding (
H2O-R) - The Python binding (
H2O-Python) - Any custom integration
Key sections
Quickstart
Train your first model in Python or R in under 5 minutes.
Installation
Install H2O-3 via pip, conda, CRAN, or download the standalone jar.
Algorithm reference
Detailed documentation for every supported algorithm.
AutoML
Automatically train and rank hundreds of models with a single call.