Skip to main content
Answers to frequently asked questions about NVIDIA Isaac GR00T.

General

NVIDIA Isaac GR00T N1.6 is an open vision-language-action (VLA) model for generalized humanoid robot skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.GR00T N1.6 is trained on a diverse mixture of robot data including bimanual, semi-humanoid and an expansive humanoid dataset. It is adaptable through post-training for specific embodiments, tasks and environments.
GR00T N1.6 is intended for researchers and professionals in robotics. This repository provides tools to:
  • Leverage a pre-trained foundation model for robot control
  • Fine-tune on small, custom datasets
  • Adapt the model to specific robotics tasks with minimal data
  • Deploy the model for inference
The focus is on enabling customization of robot behaviors through finetuning.
GR00T N1.6 represents a significant upgrade over N1.5 with:Architectural changes:
  • Uses NVIDIA Cosmos-Reason-2B VLM variant with flexible resolution support
  • 2x larger DiT (32 layers vs 16 layers)
  • Simplified architecture removing the post-VLM transformer adapter
  • Predicts state-relative action chunks instead of absolute positions
Additional training data:
  • Bimanual YAM arms
  • AGIBot Genie1
  • Simulated Galaxea R1 Pro on BEHAVIOR suite
  • Unitree G1 whole-body locomanipulation
Code improvements:
  • Faster dataloader with sharded support
  • Simplified data processing pipeline
  • Flexible training configuration
Yes, to use GR00T N1.5, checkout the n1.5-release branch.
git checkout n1.5-release

Installation and setup

Minimum requirements:
  • NVIDIA GPU with CUDA support (CUDA 12.4 recommended, 11.8 also works)
  • Python 3.10
  • uv v0.8.4+ for dependency management
Recommended hardware for finetuning:
  • 1 H100 node or L40 node for optimal performance
  • RTX PRO 6000 Blackwell Server Edition or DGX B300 for production use
Deployment:
  • Jetson AGX Thor Developer Kit for edge deployment
See the hardware recommendation guide for detailed specifications.
Yes, GR00T relies on submodules for certain dependencies. Include them when cloning:
git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T
If you’ve already cloned without submodules, initialize them separately:
git submodule update --init --recursive
Use Docker if:
  • You want to avoid system-level dependency conflicts
  • You need a reproducible environment
  • You’re deploying to multiple machines
Use local installation if:
  • You need direct access to GPU drivers
  • You’re developing and iterating quickly
  • You prefer managing dependencies with uv
Both options are fully supported. See the Docker setup guide for containerized setup.
CUDA 12.4 is recommended and officially tested. However, CUDA 11.8 has also been verified to work.
  • For CUDA 11.8: Install flash-attn==2.8.2
  • For CUDA 12.8 (RTX 5090): Use flash-attn==2.8.0.post2 with pytorch-cu128
Make sure to install a compatible version of flash-attn manually if using a different CUDA version.

Training and finetuning

GR00T is designed to work with small, custom datasets. The exact amount depends on:
  • Task complexity
  • Similarity to pre-training data
  • Desired performance level
In practice, successful finetuning has been achieved with datasets ranging from hundreds to thousands of demonstrations. The model’s pre-training on 10k+ hours of robot data enables efficient transfer learning.
For optimal results, maximize your batch size based on available hardware and train for a few thousand steps.General recommendations:
  • H100: Batch size 32-64
  • L40: Batch size 16-32
  • RTX 4090: Batch size 8-16
If you encounter OOM errors, reduce:
  • --global-batch-size
  • --dataloader-num-workers
  • --num-shards-per-epoch
  • --shard-size
Training variance is expected. In our experiments, we have observed performance differences as large as 5-6% between runs with the same configuration, seed, and dropout settings.This variance may be attributed to:
  • Non-deterministic operations in image augmentations
  • Stochastic components in the training pipeline
  • Hardware differences
Recommendations:
  • Run multiple training runs and select the best checkpoint
  • Use validation metrics to track performance
  • Keep this variance in mind when comparing to reported benchmarks
Yes! Follow the finetuning guide for new embodiments.Prerequisites:
  • Demonstration data in GR00T-flavored LeRobot v2 format
  • Modality configuration file specifying cameras, states, and actions
Steps:
  1. Convert your data to LeRobot format
  2. Create a modality config JSON
  3. Run the finetuning script with --embodiment-tag NEW_EMBODIMENT
GR00T provides several pre-registered embodiment tags with ready-to-use configurations:
  • LIBERO_PANDA
  • OXE_GOOGLE
  • OXE_WIDOWX
  • UNITREE_G1
  • BEHAVIOR_R1_PRO
  • GR1
These can be used directly without creating custom modality configs.

Inference and deployment

GR00T-N1.6-3B inference timing (4 denoising steps, single view):| Device | End-to-end latency | Frequency | |--------|-------------------|-----------|| | RTX 5090 | 37 ms | 27.3 Hz | | H100 | 38 ms | 26.3 Hz | | RTX 4090 | 44 ms | 22.8 Hz | | Jetson AGX Thor | 105 ms | 9.5 Hz |For faster inference, use TensorRT optimization.
Use server-client architecture when:
  • Running policy on a separate GPU server from robot controller
  • Need to isolate dependencies between policy and robot code
  • Deploying to multiple robots with centralized inference
Use local policy when:
  • Policy and robot controller run on the same machine
  • Need lowest possible latency
  • Simple single-robot setup
The server-client architecture uses ZeroMQ for efficient communication and is recommended for most production deployments.
Use ReplayPolicy to replay ground-truth actions from your dataset:
# Start server with ReplayPolicy
uv run python gr00t/eval/run_gr00t_server.py \
  --dataset-path /path/to/lerobot_dataset \
  --embodiment-tag NEW_EMBODIMENT \
  --execution-horizon 8
If replaying achieves high (often 100%) success rates, your environment is set up correctly. Low success rates indicate:
  • Environment reset state mismatch
  • Observation preprocessing issues
  • Action space mismatches
Yes, GR00T can be deployed on Jetson AGX Thor for edge inference. The Thor platform provides:
  • Blackwell GPU with 2560 CUDA cores
  • 14-core Arm Neoverse-V3AE CPU
  • 128GB LP5 memory
Expect inference at approximately 9.5 Hz (105ms latency) for GR00T-N1.6-3B with torch.compile.
The policy expects observations as a nested dictionary with three modalities:
observation = {
    "video": {
        "camera_name": np.ndarray,  # Shape: (B, T, H, W, 3), dtype: uint8
    },
    "state": {
        "state_name": np.ndarray,   # Shape: (B, T, D), dtype: float32
    },
    "language": {
        "task": [[str]],            # Shape: (B, 1)
    }
}
Use policy.get_modality_config() to query expected camera names, state keys, and temporal horizons.

Data and datasets

GR00T uses the GR00T-flavored LeRobot v2 format, which is compatible with Hugging Face LeRobot Dataset V2.The format consists of:
  • Parquet files for episode metadata and timesteps
  • MP4 videos for camera observations
  • Numpy arrays for states and actions
See the data preparation guide for details.
Use the conversion scripts provided in scripts/lerobot_conversion/:
python scripts/lerobot_conversion/convert_v3_to_v2.py \
  --input-dir /path/to/v3 \
  --output-dir /path/to/v2
For custom data formats, refer to the data preparation guide for the schema specification.
GR00T N1.6 is pre-trained on 10k+ hours of diverse robot data including:
  • Bimanual manipulation datasets
  • Semi-humanoid robot demonstrations
  • Humanoid whole-body control data
  • Bimanual YAM arms
  • AGIBot Genie1
  • Simulated Galaxea R1 Pro on BEHAVIOR suite
  • Unitree G1 whole-body locomanipulation
Additionally, it builds on the N1.5 data mixture with several thousand additional hours.

Evaluation

Open-loop evaluation:
  • Offline assessment comparing predicted actions to ground truth
  • Fast, doesn’t require simulation environment
  • Good for quick validation of model training
  • Generates MSE plots and visualizations
Closed-loop evaluation:
  • Online assessment in simulation or real environment
  • Tests actual policy performance with visual feedback
  • Required for measuring task success rates
  • More realistic but slower
We recommend starting with open-loop, then moving to closed-loop for comprehensive assessment.
Zero-shot evaluation (no finetuning):
  • RoboCasa
  • RoboCasa GR1 Tabletop Tasks
Finetuned evaluation:
  • LIBERO
  • SimplerEnv (Bridge, Fractal)
  • BEHAVIOR-1K
  • G1 LocoManipulation
  • PointNav
  • SO-100
  • DROID
Each benchmark has its own setup guide in the examples/ directory.
To add a new benchmark:
  1. Register environment prefix in gr00t/eval/sim/env_utils.py:
    ENV_PREFIX_TO_EMBODIMENT_TAG = {
        ...
        "my_new_benchmark": EmbodimentTag.MY_ROBOT,
    }
    
  2. Add test cases in tests/gr00t/eval/sim/test_env_utils.py
  3. Create environment wrapper that implements the Policy API observation/action format
  4. Test with ReplayPolicy before running with trained models
See the evaluation guide for detailed instructions.

Contributing

We welcome contributions! See the contributing guide for:
  • Reporting bugs
  • Suggesting features
  • Making pull requests
  • Code style guidelines
  • Testing requirements
  1. Search existing issues to see if it’s already reported
  2. If not, open a new issue with:
    • Clear title and description
    • Steps to reproduce
    • Expected vs actual behavior
    • Error messages and stack traces
    • Environment information (CUDA version, GPU, OS, etc.)
GR00T uses ruff for code formatting and linting:
# Format code
ruff format .

# Fix linting issues
ruff check --fix .
All pull requests must pass the CI checks which include ruff formatting and linting.

Build docs developers (and LLMs) love