TrainingConfig
TheTrainingConfig class defines parameters for training Vision-Language-Action models.
Output configuration
Directory where model checkpoints, logs, and outputs are saved.
Optional name for the experiment. Used for organizing outputs and tracking.
Basic training parameters
Total number of training steps to run. This overrides
num_epochs.Total effective batch size across all GPUs and accumulation steps.
Per-device batch size. If
None, calculated from global_batch_size.Number of forward passes to accumulate before performing a backward/update step.
Optimization parameters
Initial learning rate for the optimizer.
Learning rate scheduler type (e.g.,
cosine, linear, constant).Weight decay coefficient for optimizer (L2 regularization).
Proportion of total training steps used for learning rate warm-up.
Number of warm-up steps. Overrides
warmup_ratio if set.Maximum gradient norm for gradient clipping.
Optimizer choice. Options include:
adamw_torch: Standard AdamW from PyTorchadamw_torch_fused: Fused AdamW (faster)paged_adamw_32bit: Paged AdamW 32-bit (requires bitsandbytes)paged_adamw_8bit: Paged AdamW 8-bit (requires bitsandbytes)adafactor: Adafactor optimizer
Path to a checkpoint to resume training from.
Mixed precision training
Enable TF32 mode for NVIDIA Ampere GPUs and later.
Enable FP16 mixed precision training.
Enable BF16 mixed precision training.
Use BF16 for evaluation.
Logging and checkpointing
Frequency (in training steps) at which to log training metrics.
Frequency (in training steps) at which to save checkpoints.
Maximum number of checkpoints to keep before older ones are deleted.
Control whether to save VL model and processor in callbacks.
Checkpoint uploading
Enable automatic checkpoint uploading.
Upload checkpoints every N steps.
Number of most recent checkpoints to keep uploaded.
Maximum number of concurrent checkpoint uploads.
Evaluation parameters
Evaluation strategy:
no, steps, or epoch.Frequency (in steps) at which to run evaluation.
Ratio of data to use for evaluation split.
Batch size for evaluation.
Name of the metric to use for saving best checkpoints.
Whether higher values of the eval metric are better.
DeepSpeed configuration
ZeRO optimization stage (1, 2, or 3).
Enable gradient checkpointing to reduce memory usage.
Transformers loading parameters
Trust remote code when loading models from Hugging Face Hub.
Only use local files (no downloads from Hugging Face Hub).
Directory for caching Hugging Face models.
Access token for Hugging Face Hub (for private models).
DDP configuration
Use DistributedDataParallel instead of DeepSpeed.
DDP bucket capacity in MB for gradient communication.
Hardware configuration
Number of GPUs to use for training.
Number of parallel worker processes for data loading.
Data handling
Whether to remove unused columns from the dataset.
Experiment tracking
Enable Weights & Biases (wandb) logging.
Wandb project name for tracking experiments.
Performance profiling
Enable PyTorch profiler for performance analysis.
Fault tolerance
Maximum number of retries in training for fault tolerance.
Testing
For testing: assert that loss is less than this value.
Reinforcement learning
Add reinforcement learning callback during training.
Open-loop evaluation
Enable open-loop evaluation on saved checkpoints.
List of trajectory IDs to evaluate.
Number of steps to evaluate per trajectory.
List of action indices to plot. If
None, plots all indices.FinetuneConfig
TheFinetuneConfig class is a simplified configuration specifically designed for single-node fine-tuning. See launch_finetune.py for detailed parameter descriptions.
Key differences from TrainingConfig
- Focused on single-node training scenarios
- Includes embodiment-specific parameters
- Provides granular control over which model components to tune
- Includes data augmentation parameters
- Simplified parameter set compared to full
TrainingConfig