Base models
We provide pre-trained base VLA model checkpoints. These checkpoints have been pre-trained on 10k+ hours of robot data and can be used for finetuning on downstream tasks.GR00T N1.5
Base GR00T N1.5 model (3B parameters) - Use the n1.5-release branch
GR00T N1.6
Base GR00T N1.6 model (3B parameters) - Latest version with improved performance
Model comparison
| Model | Use case | Description | Checkpoint path | Branch |
|---|---|---|---|---|
| GR00T N1.5 | Finetuning | Base GR00T N1.5 model (3B parameters) | nvidia/GR00T-N1.5-3B | n1.5-release |
| GR00T N1.6 | Finetuning | Base GR00T N1.6 model (3B parameters) | nvidia/GR00T-N1.6-3B | main |
GR00T N1.6 represents a significant upgrade over N1.5, with improvements in both model architecture and training data leading to better performance across many benchmarks.
Finetuned models
We provide finetuned checkpoints for various robot platforms and benchmarks. These models are finetuned from the base models above and can be used directly for evaluation or as starting points for further finetuning.Available finetuned checkpoints
Bridge dataset
Fine-tuned for WidowX robot on manipulation tasks
Fractal dataset
Fine-tuned for Google robot on manipulation tasks
BEHAVIOR-1K
Fine-tuned for Galaxea R1 Pro robot on loco-manipulation tasks
Unitree G1
Fine-tuned for Unitree G1 loco-manipulation pick-and-place tasks
DROID
Fine-tuned for DROID robot on manipulation tasks
Finetuned model details
| Model | Base model | Description | Checkpoint path | Example |
|---|---|---|---|---|
| GR00T-N1.6-bridge | nvidia/GR00T-N1.6-3B | Fine-tuned on Bridge dataset for WidowX robot on manipulation tasks | nvidia/GR00T-N1.6-bridge | SimplerEnv |
| GR00T-N1.6-fractal | nvidia/GR00T-N1.6-3B | Fine-tuned on Fractal dataset for Google robot on manipulation tasks | nvidia/GR00T-N1.6-fractal | SimplerEnv |
| GR00T-N1.6-BEHAVIOR1k | nvidia/GR00T-N1.6-3B | Fine-tuned on BEHAVIOR-1K for Galaxea R1 Pro robot on loco-manipulation tasks | nvidia/GR00T-N1.6-BEHAVIOR1k | BEHAVIOR |
| GR00T-N1.6-G1-PnPAppleToPlate | nvidia/GR00T-N1.6-3B | Fine-tuned for Unitree G1 loco-manipulation pick-and-place tasks | nvidia/GR00T-N1.6-G1-PnPAppleToPlate | G1 LocoManipulation |
| GR00T-N1.6-DROID | nvidia/GR00T-N1.6-DROID | Fine-tuned for DROID robot on manipulation tasks | nvidia/GR00T-N1.6-DROID | DROID |
Using model checkpoints
All model checkpoints are hosted on Hugging Face and can be downloaded automatically when you run inference or finetuning scripts.Quick start with base model
Using a finetuned model
Inference performance
GR00T-N1.6-3B inference timing (4 denoising steps, single view): | Device | Mode | Data processing | Backbone | Action head | End-to-end | Frequency | |--------|------|-----------------|----------|-------------|------------|-----------|| | RTX 5090 | torch.compile | 2 ms | 18 ms | 16 ms | 37 ms | 27.3 Hz | | H100 | torch.compile | 4 ms | 23 ms | 11 ms | 38 ms | 26.3 Hz | | RTX 4090 | torch.compile | 2 ms | 25 ms | 17 ms | 44 ms | 22.8 Hz | | Thor | torch.compile | 5 ms | 39 ms | 61 ms | 105 ms | 9.5 Hz |For faster inference with TensorRT optimization, see the deployment guide.
What’s new in GR00T N1.6
GR00T N1.6 represents a significant upgrade over GR00T N1.5, with improvements in both model architecture and data leading to better performance in many aspects.Model and data improvements
Architectural changes:- Base VLM: Uses an internal NVIDIA Cosmos-Reason-2B VLM variant that supports flexible resolution and can encode images in their native aspect ratio without padding
- Larger DiT: Uses 2x larger DiT (32 layers vs 16 layers in N1.5)
- Simplified architecture: Removes N1.5’s post-VLM 4-layer transformer adapter and instead unfreezes top 4 layers of the VLM during pretraining
- State-relative actions: Predicts state-relative action chunks for most embodiments rather than absolute joint angles or EEF positions
- Bimanual YAM arms
- AGIBot Genie1
- Simulated Galaxea R1 Pro on the BEHAVIOR suite
- Whole-body locomanipulation with Unitree G1
- Faster dataloader with sharded dataloader support
- RTC and async policy wrapper for inference
- Simplified data processing pipeline
- Flexible training configuration