Overview
TheGr00tPolicy class handles end-to-end inference with GR00T models. It loads a pre-trained vision-language-action model, processes observations, and generates robot actions.
Class definition
gr00t/policy/gr00t_policy.py
Constructor
The embodiment tag defining the robot/environment type (e.g.,
EmbodimentTag.GR1, EmbodimentTag.UNITREE_G1)Path to the pretrained model checkpoint directory or HuggingFace model ID (e.g.,
"nvidia/GR00T-N1.6-3B")Device to run the model on (e.g.,
'cuda:0', 0, 'cpu')Whether to enforce strict input validation
Methods
get_action
Generate actions from observations.Observation dictionary with the following structure:
video: dict[str, np.ndarray[np.uint8, (B, T, H, W, C)]]state: dict[str, np.ndarray[np.float32, (B, T, D)]]language: dict[str, list[list[str]]]
Optional parameters (currently unused)
Dictionary of action arrays with shape (B, T, D) where:
- B: batch size
- T: action horizon
- D: action dimension
Additional information dictionary (currently empty)
get_modality_config
Get the modality configuration for the current embodiment.Dictionary mapping modality names (
"video", "state", "action", "language") to their configurationsreset
Reset the policy to its initial state.Optional reset parameters
Information dictionary after reset (currently empty)
check_observation
Validate observation structure and types.Observation to validate
This method raises
AssertionError if validation fails. It checks:- All required modalities are present (
video,state,language) - Data types match expectations (uint8 for video, float32 for state)
- Shapes match the modality configuration
- Temporal dimensions are consistent
check_action
Validate action structure and types.Action dictionary to validate
Usage example
Observation format
The policy expects observations in the following nested dictionary format:B: Batch sizeT: Temporal horizon (number of frames/timesteps)H,W: Image height and widthC: Number of channels (must be 3 for RGB)D: State dimension
Action format
Actions are returned in a similar nested format:B: Batch sizeT: Action horizon (number of future timesteps)D: Action dimension
Properties
The loaded GR00T model in evaluation mode with bfloat16 precision
The processor for input/output transformation
The embodiment tag for this policy
Modality configurations for the current embodiment
Collation function for batching observations
The language modality key (currently only one is supported)
See also
Gr00tSimPolicyWrapper
Wrapper for GR00T simulation environments
PolicyClient
Client for remote inference
Policy API guide
Complete guide to using the policy API
EmbodimentTag
Available embodiment tags