Overview
Therfx train command executes a training stage and registers the resulting artifact in the run registry. It integrates with the rfx workflow system to track training runs, configurations, and outputs.
Usage
Options
Path to training data directory or dataset. This can be a local LeRobot dataset directory or a reference to data stored elsewhere.
Path to training configuration file (YAML or JSON). The config file specifies hyperparameters, model architecture, and training settings.
Additional input references (repeatable). Use this flag multiple times to specify additional inputs for the training stage.Example:
--input path/to/pretrained.pth --input path/to/normalization.jsonAdditional output references (repeatable). Specify where to save additional training artifacts beyond the default policy checkpoint.Example:
--output checkpoints/ --output logs/Examples
Basic training
Train a policy from a local dataset:Training with additional inputs
Use a pretrained model as starting point:Specifying custom outputs
Save checkpoints to a custom location:Training Workflow
Thetrain command:
- Generates a unique run ID - Creates a timestamped identifier for this training run
- Snapshots the config - Captures the complete training configuration for reproducibility
- Executes the training stage - Runs the training script defined in your workflow
- Registers the run - Records metadata, config, inputs, outputs, and artifacts in the run registry
- Reports results - Prints run ID, status, and artifact locations
Output
The command prints the training run details:Configuration File Format
Training configuration files can specify:configs/train.yaml
Run Registry
After training, query your runs:- Run ID and timestamp
- Training configuration (for reproducibility)
- Input data and model references
- Output artifacts and checkpoints
- Training metrics and logs
- Success/failure status
Integration with Workflows
Thetrain command integrates with the rfx workflow system. You can define custom training stages in your workflow configuration that handle:
- Different model architectures
- Various training algorithms (behavioral cloning, RL, etc.)
- Multi-stage training pipelines
- Distributed training
- Hyperparameter optimization
Troubleshooting
Missing data directory
rfx record to collect demonstrations first.
Invalid configuration
Out of memory
If training fails with CUDA out of memory errors:- Reduce
batch_sizein your config - Decrease model size (
hidden_dim,num_layers) - Use gradient accumulation
- Enable mixed precision training
See Also
- Record command - Collect demonstration data
- Deploy command - Run trained policies on hardware
- Train Policy workflow - Detailed training guide
- Hub Integration - Push models to HuggingFace Hub
