Multinomial classification and regression for continuous targets are implemented experimentally and may produce less reliable results than the primary binomial mode.
How FTRL Works
FTRL-Proximal is an online learning algorithm — it updates model weights incrementally as data arrives, making it memory-efficient for very large datasets. It employs a hashing trick to vectorize features:- Boolean and integer values are hashed with an identity function.
- Float values are hashed by trimming mantissa bits (controlled by
mantissa_nbits) and interpreting the result as a 64-bit unsigned integer. - Strings are hashed with the 64-bit Murmur2 function.
- The final hash is combined with the hashed feature name and taken modulo
nbins.
Creating an FTRL Model
TheFtrl class lives in datatable.models:
Hyperparameters
| Parameter | Default | Description |
|---|---|---|
alpha | 0.005 | Learning rate (α in the FTRL-Proximal algorithm). Must be positive. |
beta | 1.0 | β in the FTRL-Proximal algorithm. Must be non-negative. |
lambda1 | 0.0 | L1 regularization parameter. Non-negative. |
lambda2 | 0.0 | L2 regularization parameter. Non-negative. |
nbins | 1_000_000 | Number of hash bins for the hashing trick. Larger values reduce hash collisions. |
mantissa_nbits | 10 | Number of mantissa bits used when hashing floats (0–52). |
nepochs | 1 | Number of training epochs. Accepts fractional values. |
interactions | None | Feature interaction pairs/groups to add as additional features. |
model_type | "auto" | "auto", "binomial", "multinomial", or "regression". |
negative_class | False | Whether to create a “negative” class for multinomial classification. |
double_precision | False | Use float64 internally (doubles memory footprint). |
Training
Use.fit() to train the model. X_train must be a datatable Frame of shape (nrows, ncols) and y_train a Frame of shape (nrows, 1). Supported column types for X_train: bool, int, real, str.
Early Stopping
Pass a validation set to enable early stopping. Training halts when the relative validation error fails to improve byvalidation_error within nepochs_validation epochs.
Predicting
predict() returns a Frame of shape (X_test.nrows, nlabels) with predicted probabilities for each label. The test frame must have the same number of columns as the training frame.
Feature Importances
After training, per-feature weight contributions are accumulated. Access them as:Feature Interactions
You can add synthetic cross-features by specifyinginteractions — a list of column-name groups. Each group becomes a single hashed interaction feature.
C0:C1:C3 and C2:C5. Interactions must be set before calling .fit() and cannot be changed once the model is trained.
Resetting the Model
Reset learned weights while keeping the current hyperparameters:Complete Binary Classification Example
When to Use FTRL
Good fit
- Very large or streaming datasets that don’t fit in memory
- High-dimensional sparse feature spaces (e.g., click-through rate prediction)
- Scenarios where online/incremental learning is required
- Binary classification tasks
Consider alternatives
- Small datasets where batch methods converge faster
- Multinomial or regression tasks (experimental support only)
- When interpretable linear coefficients are needed (use LinearModel)