rfx train

Overview

The rfx train command executes a training stage and registers the resulting artifact in the run registry. It integrates with the rfx workflow system to track training runs, configurations, and outputs.

Usage

rfx train [OPTIONS]

Options

--data

string

default:"None"

Path to training data directory or dataset. This can be a local LeRobot dataset directory or a reference to data stored elsewhere.

--config

string

default:"None"

Path to training configuration file (YAML or JSON). The config file specifies hyperparameters, model architecture, and training settings.

--input

string

default:"[]"

Additional input references (repeatable). Use this flag multiple times to specify additional inputs for the training stage.Example: --input path/to/pretrained.pth --input path/to/normalization.json

--output

string

default:"[]"

Additional output references (repeatable). Specify where to save additional training artifacts beyond the default policy checkpoint.Example: --output checkpoints/ --output logs/

Examples

Basic training

Train a policy from a local dataset:

rfx train --data datasets/my-demos --config configs/train.yaml

Training with additional inputs

Use a pretrained model as starting point:

rfx train \
  --data datasets/my-demos \
  --config configs/train.yaml \
  --input runs/pretrained-base/policy

Specifying custom outputs

Save checkpoints to a custom location:

rfx train \
  --data datasets/my-demos \
  --config configs/train.yaml \
  --output checkpoints/experiment-1/

Training Workflow

The train command:

Generates a unique run ID - Creates a timestamped identifier for this training run
Snapshots the config - Captures the complete training configuration for reproducibility
Executes the training stage - Runs the training script defined in your workflow
Registers the run - Records metadata, config, inputs, outputs, and artifacts in the run registry
Reports results - Prints run ID, status, and artifact locations

Output

The command prints the training run details:

[rfx] train run_id=train-20240311-123456 status=succeeded
[rfx] artifact: runs/train-20240311-123456/policy
[rfx] artifact: runs/train-20240311-123456/checkpoints

Configuration File Format

Training configuration files can specify:

configs/train.yaml

# Model architecture
model:
  type: "mlp"
  hidden_dim: 256
  num_layers: 3

# Training hyperparameters
training:
  learning_rate: 3e-4
  batch_size: 64
  num_epochs: 100
  
# Data settings
data:
  train_split: 0.9
  shuffle: true
  
# Hardware
device: "cuda"
num_workers: 4

Run Registry

After training, query your runs:

# List all training runs
rfx runs list --stage train

# Show details of a specific run
rfx runs show train-20240311-123456

The registry tracks:

Run ID and timestamp
Training configuration (for reproducibility)
Input data and model references
Output artifacts and checkpoints
Training metrics and logs
Success/failure status

Integration with Workflows

The train command integrates with the rfx workflow system. You can define custom training stages in your workflow configuration that handle:

Different model architectures
Various training algorithms (behavioral cloning, RL, etc.)
Multi-stage training pipelines
Distributed training
Hyperparameter optimization

See the Train Policy workflow guide for detailed examples.

Troubleshooting

Missing data directory

[rfx] Train failed: FileNotFoundError: Dataset not found at 'datasets/my-demos'

Ensure the dataset path exists and contains valid LeRobot data. Use rfx record to collect demonstrations first.

Invalid configuration

[rfx] Train failed: ValueError: Invalid config key 'model.typpo'

Check your YAML/JSON syntax and ensure all config keys are valid for your training workflow.

Out of memory

If training fails with CUDA out of memory errors:

Reduce batch_size in your config
Decrease model size (hidden_dim, num_layers)
Use gradient accumulation
Enable mixed precision training

Get Started

Core Concepts

CLI Reference

Hardware

Simulation

Workflows

Advanced

rfx train

Overview

Usage

Options

Examples

Basic training

Training with additional inputs

Specifying custom outputs

Training Workflow

Output

Configuration File Format

Run Registry

Integration with Workflows

Troubleshooting

Missing data directory

Invalid configuration

Out of memory

See Also

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Reference

Hardware

Simulation

Workflows

Advanced

​Overview

​Usage

​Options

​Examples

​Basic training

​Training with additional inputs

​Specifying custom outputs

​Training Workflow

​Output

​Configuration File Format

​Run Registry

​Integration with Workflows

​Troubleshooting

​Missing data directory

​Invalid configuration

​Out of memory

​See Also

Build docs developers (and LLMs) love

Overview

Usage

Options

Examples

Basic training

Training with additional inputs

Specifying custom outputs

Training Workflow

Output

Configuration File Format

Run Registry

Integration with Workflows

Troubleshooting

Missing data directory

Invalid configuration

Out of memory

See Also