Skip to main content
This page provides links to all available GR00T model checkpoints, including base models for finetuning and pre-finetuned models for specific robot platforms and benchmarks.

Base models

We provide pre-trained base VLA model checkpoints. These checkpoints have been pre-trained on 10k+ hours of robot data and can be used for finetuning on downstream tasks.

GR00T N1.5

Base GR00T N1.5 model (3B parameters) - Use the n1.5-release branch

GR00T N1.6

Base GR00T N1.6 model (3B parameters) - Latest version with improved performance

Model comparison

ModelUse caseDescriptionCheckpoint pathBranch
GR00T N1.5FinetuningBase GR00T N1.5 model (3B parameters)nvidia/GR00T-N1.5-3Bn1.5-release
GR00T N1.6FinetuningBase GR00T N1.6 model (3B parameters)nvidia/GR00T-N1.6-3Bmain
GR00T N1.6 represents a significant upgrade over N1.5, with improvements in both model architecture and training data leading to better performance across many benchmarks.

Finetuned models

We provide finetuned checkpoints for various robot platforms and benchmarks. These models are finetuned from the base models above and can be used directly for evaluation or as starting points for further finetuning.

Available finetuned checkpoints

Bridge dataset

Fine-tuned for WidowX robot on manipulation tasks

Fractal dataset

Fine-tuned for Google robot on manipulation tasks

BEHAVIOR-1K

Fine-tuned for Galaxea R1 Pro robot on loco-manipulation tasks

Unitree G1

Fine-tuned for Unitree G1 loco-manipulation pick-and-place tasks

DROID

Fine-tuned for DROID robot on manipulation tasks

Finetuned model details

ModelBase modelDescriptionCheckpoint pathExample
GR00T-N1.6-bridgenvidia/GR00T-N1.6-3BFine-tuned on Bridge dataset for WidowX robot on manipulation tasksnvidia/GR00T-N1.6-bridgeSimplerEnv
GR00T-N1.6-fractalnvidia/GR00T-N1.6-3BFine-tuned on Fractal dataset for Google robot on manipulation tasksnvidia/GR00T-N1.6-fractalSimplerEnv
GR00T-N1.6-BEHAVIOR1knvidia/GR00T-N1.6-3BFine-tuned on BEHAVIOR-1K for Galaxea R1 Pro robot on loco-manipulation tasksnvidia/GR00T-N1.6-BEHAVIOR1kBEHAVIOR
GR00T-N1.6-G1-PnPAppleToPlatenvidia/GR00T-N1.6-3BFine-tuned for Unitree G1 loco-manipulation pick-and-place tasksnvidia/GR00T-N1.6-G1-PnPAppleToPlateG1 LocoManipulation
GR00T-N1.6-DROIDnvidia/GR00T-N1.6-DROIDFine-tuned for DROID robot on manipulation tasksnvidia/GR00T-N1.6-DROIDDROID

Using model checkpoints

All model checkpoints are hosted on Hugging Face and can be downloaded automatically when you run inference or finetuning scripts.

Quick start with base model

# Start the policy server with a base model
uv run python gr00t/eval/run_gr00t_server.py \
  --embodiment-tag GR1 \
  --model-path nvidia/GR00T-N1.6-3B

Using a finetuned model

# Use a finetuned checkpoint for evaluation
uv run python scripts/deployment/standalone_inference_script.py \
  --model-path nvidia/GR00T-N1.6-bridge \
  --dataset-path demo_data/widowx.dataset \
  --embodiment-tag OXE_WIDOWX \
  --traj-ids 0 1 2
When you specify a Hugging Face model path like nvidia/GR00T-N1.6-3B, the model will be automatically downloaded and cached on your system. You can also use a local path to a checkpoint directory.

Inference performance

GR00T-N1.6-3B inference timing (4 denoising steps, single view): | Device | Mode | Data processing | Backbone | Action head | End-to-end | Frequency | |--------|------|-----------------|----------|-------------|------------|-----------|| | RTX 5090 | torch.compile | 2 ms | 18 ms | 16 ms | 37 ms | 27.3 Hz | | H100 | torch.compile | 4 ms | 23 ms | 11 ms | 38 ms | 26.3 Hz | | RTX 4090 | torch.compile | 2 ms | 25 ms | 17 ms | 44 ms | 22.8 Hz | | Thor | torch.compile | 5 ms | 39 ms | 61 ms | 105 ms | 9.5 Hz |
For faster inference with TensorRT optimization, see the deployment guide.

What’s new in GR00T N1.6

GR00T N1.6 represents a significant upgrade over GR00T N1.5, with improvements in both model architecture and data leading to better performance in many aspects.

Model and data improvements

Architectural changes:
  • Base VLM: Uses an internal NVIDIA Cosmos-Reason-2B VLM variant that supports flexible resolution and can encode images in their native aspect ratio without padding
  • Larger DiT: Uses 2x larger DiT (32 layers vs 16 layers in N1.5)
  • Simplified architecture: Removes N1.5’s post-VLM 4-layer transformer adapter and instead unfreezes top 4 layers of the VLM during pretraining
  • State-relative actions: Predicts state-relative action chunks for most embodiments rather than absolute joint angles or EEF positions
Beyond the N1.5 data mixture, the N1.6 pretraining data additionally includes several thousand hours of teleoperated data from:
  • Bimanual YAM arms
  • AGIBot Genie1
  • Simulated Galaxea R1 Pro on the BEHAVIOR suite
  • Whole-body locomanipulation with Unitree G1
Other code-level improvements:
  • Faster dataloader with sharded dataloader support
  • RTC and async policy wrapper for inference
  • Simplified data processing pipeline
  • Flexible training configuration

Build docs developers (and LLMs) love