Skip to main content
The launch_finetune.py script provides a streamlined interface for fine-tuning pretrained GR00T models on your own datasets. It handles model loading, data configuration, and distributed training setup.

Usage

python -m gr00t.experiment.launch_finetune \
  --base-model-path <path-to-checkpoint> \
  --dataset-path <path-to-dataset> \
  --embodiment-tag <embodiment> \
  --output-dir ./outputs

Parameters

Data and model paths

base-model-path
str
required
Path to the pretrained base model checkpoint (e.g., Hugging Face model hub or local directory).
dataset-path
str
required
Path to the dataset root directory containing trajectory data for fine-tuning.
embodiment-tag
EmbodimentTag
required
Identifier specifying which embodiment (robot configuration) this fine-tuning run targets.
modality-config-path
str | None
default:"None"
Path to a Python file defining the modality configuration for the given embodiment. If None, uses the pre-registered modality config in gr00t/configs/data/embodiment_configs.py.

Model tuning flags

tune-llm
bool
default:"False"
If True, fine-tune the language model (LLM) backbone during training.
tune-visual
bool
default:"False"
If True, fine-tune the visual encoder (e.g., ViT or CNN backbone).
tune-projector
bool
default:"True"
If True, fine-tune the multimodal projector layers that map vision/language features to a shared space.
tune-diffusion-model
bool
default:"True"
If True, fine-tune the diffusion-based action decoder (if present in the model).
state-dropout-prob
float
default:"0.0"
Dropout probability applied to state inputs for regularization during training.

Data augmentation

random-rotation-angle
int | None
default:"None"
Maximum rotation angle (in degrees) for random rotation augmentation of input images.
color-jitter-params
dict[str, float] | None
default:"None"
Parameters for color jitter augmentation on images.Expected keys include:
  • brightness: float
  • contrast: float
  • saturation: float
  • hue: float
Example: {"brightness": 0.4, "contrast": 0.4, "saturation": 0.4, "hue": 0.1}If None, applies the default color jitter augmentation from the pretrained model.

Training configuration

global-batch-size
int
default:"64"
Total effective batch size across all GPUs and accumulation steps.
dataloader-num-workers
int
default:"2"
Number of parallel worker processes used for data loading.
learning-rate
float
default:"1e-4"
Initial learning rate for optimizer.
gradient-accumulation-steps
int
default:"1"
Number of forward passes to accumulate before performing a backward/update step.
output-dir
str
default:"./outputs"
Directory where model checkpoints, logs, and outputs are saved.
save-steps
int
default:"1000"
Frequency (in training steps) at which to save checkpoints.
save-total-limit
int
default:"5"
Maximum number of checkpoints to keep before older ones are deleted.
num-gpus
int
default:"1"
Number of GPUs available for distributed or single-node training.
use-wandb
bool
default:"False"
If True, log metrics and artifacts to Weights & Biases (wandb). The project is finetune-gr00t-n1d6. You need to login to wandb to view the logs.
max-steps
int
default:"10000"
Total number of training steps to run before stopping.
weight-decay
float
default:"1e-5"
Weight decay coefficient for optimizer (L2 regularization).
warmup-ratio
float
default:"0.05"
Proportion of total training steps used for learning rate warm-up.
shard-size
int
default:"1024"
Size of the shard to use for the dataset during preloading.
episode-sampling-rate
float
default:"0.1"
Sampling rate for the episodes.
num-shards-per-epoch
int
default:"100000"
Number of shards to use for the dataset. Reduce this number if VRAM is limited.

Examples

Basic fine-tuning

python -m gr00t.experiment.launch_finetune \
  --base-model-path nvidia/Eagle-Block2A-2B-v2 \
  --dataset-path /data/my_robot_dataset \
  --embodiment-tag FRANKA_PANDA \
  --num-gpus 1

Fine-tuning with data augmentation

python -m gr00t.experiment.launch_finetune \
  --base-model-path ./checkpoints/base_model \
  --dataset-path /data/my_robot_dataset \
  --embodiment-tag UR5 \
  --random-rotation-angle 15 \
  --color-jitter-params '{"brightness": 0.4, "contrast": 0.4, "saturation": 0.4, "hue": 0.1}' \
  --num-gpus 4

Fine-tuning with custom learning parameters

python -m gr00t.experiment.launch_finetune \
  --base-model-path nvidia/Eagle-Block2A-2B-v2 \
  --dataset-path /data/my_robot_dataset \
  --embodiment-tag FRANKA_PANDA \
  --learning-rate 5e-5 \
  --global-batch-size 128 \
  --max-steps 20000 \
  --save-steps 500 \
  --use-wandb

Fine-tuning with all model components

python -m gr00t.experiment.launch_finetune \
  --base-model-path nvidia/Eagle-Block2A-2B-v2 \
  --dataset-path /data/my_robot_dataset \
  --embodiment-tag FRANKA_PANDA \
  --tune-llm \
  --tune-visual \
  --tune-projector \
  --tune-diffusion-model \
  --num-gpus 8

Environment variables

  • LOGURU_LEVEL: Controls logging verbosity (default: INFO)

Notes

  • The script automatically sets up the model with these configurations:
    • Model: nvidia/Eagle-Block2A-2B-v2
    • Optimizer: adamw_torch
    • Wandb project: finetune-gr00t-n1d6
    • Relative action mode enabled
    • Eagle collator enabled
  • If a custom modality config is provided, it will be loaded from the specified path
  • Download cache is disabled by default for datasets

Build docs developers (and LLMs) love