launch_finetune.py script provides a streamlined interface for fine-tuning pretrained GR00T models on your own datasets. It handles model loading, data configuration, and distributed training setup.
Usage
Parameters
Data and model paths
Path to the pretrained base model checkpoint (e.g., Hugging Face model hub or local directory).
Path to the dataset root directory containing trajectory data for fine-tuning.
Identifier specifying which embodiment (robot configuration) this fine-tuning run targets.
Path to a Python file defining the modality configuration for the given embodiment. If
None, uses the pre-registered modality config in gr00t/configs/data/embodiment_configs.py.Model tuning flags
If
True, fine-tune the language model (LLM) backbone during training.If
True, fine-tune the visual encoder (e.g., ViT or CNN backbone).If
True, fine-tune the multimodal projector layers that map vision/language features to a shared space.If
True, fine-tune the diffusion-based action decoder (if present in the model).Dropout probability applied to state inputs for regularization during training.
Data augmentation
Maximum rotation angle (in degrees) for random rotation augmentation of input images.
Parameters for color jitter augmentation on images.Expected keys include:
brightness: floatcontrast: floatsaturation: floathue: float
{"brightness": 0.4, "contrast": 0.4, "saturation": 0.4, "hue": 0.1}If None, applies the default color jitter augmentation from the pretrained model.Training configuration
Total effective batch size across all GPUs and accumulation steps.
Number of parallel worker processes used for data loading.
Initial learning rate for optimizer.
Number of forward passes to accumulate before performing a backward/update step.
Directory where model checkpoints, logs, and outputs are saved.
Frequency (in training steps) at which to save checkpoints.
Maximum number of checkpoints to keep before older ones are deleted.
Number of GPUs available for distributed or single-node training.
If
True, log metrics and artifacts to Weights & Biases (wandb). The project is finetune-gr00t-n1d6. You need to login to wandb to view the logs.Total number of training steps to run before stopping.
Weight decay coefficient for optimizer (L2 regularization).
Proportion of total training steps used for learning rate warm-up.
Size of the shard to use for the dataset during preloading.
Sampling rate for the episodes.
Number of shards to use for the dataset. Reduce this number if VRAM is limited.
Examples
Basic fine-tuning
Fine-tuning with data augmentation
Fine-tuning with custom learning parameters
Fine-tuning with all model components
Environment variables
LOGURU_LEVEL: Controls logging verbosity (default:INFO)
Notes
-
The script automatically sets up the model with these configurations:
- Model:
nvidia/Eagle-Block2A-2B-v2 - Optimizer:
adamw_torch - Wandb project:
finetune-gr00t-n1d6 - Relative action mode enabled
- Eagle collator enabled
- Model:
- If a custom modality config is provided, it will be loaded from the specified path
- Download cache is disabled by default for datasets