Gr00tPolicy

Overview

The Gr00tPolicy class handles end-to-end inference with GR00T models. It loads a pre-trained vision-language-action model, processes observations, and generates robot actions.

Class definition

gr00t/policy/gr00t_policy.py

class Gr00tPolicy(BasePolicy):
    """Core policy class for Gr00t model inference.
    
    This policy handles the end-to-end inference pipeline:
    1. Validates input observations
    2. Processes observations with pretrained VLA processor
    3. Runs model inference
    4. Decodes and returns actions
    """

Constructor

embodiment_tag

EmbodimentTag

required

The embodiment tag defining the robot/environment type (e.g., EmbodimentTag.GR1, EmbodimentTag.UNITREE_G1)

model_path

str

required

Path to the pretrained model checkpoint directory or HuggingFace model ID (e.g., "nvidia/GR00T-N1.6-3B")

device

int | str

required

Device to run the model on (e.g., 'cuda:0', 0, 'cpu')

strict

bool

default:"True"

Whether to enforce strict input validation

Methods

get_action

Generate actions from observations.

def get_action(
    self, 
    observation: dict[str, Any], 
    options: dict[str, Any] | None = None
) -> tuple[dict[str, Any], dict[str, Any]]

observation

dict[str, Any]

required

Observation dictionary with the following structure:

video: dict[str, np.ndarray[np.uint8, (B, T, H, W, C)]]
state: dict[str, np.ndarray[np.float32, (B, T, D)]]
language: dict[str, list[list[str]]]

options

dict[str, Any] | None

Optional parameters (currently unused)

actions

dict[str, np.ndarray]

Dictionary of action arrays with shape (B, T, D) where:

B: batch size
T: action horizon
D: action dimension

info

dict[str, Any]

Additional information dictionary (currently empty)

get_modality_config

Get the modality configuration for the current embodiment.

def get_modality_config(self) -> dict[str, ModalityConfig]

modality_configs

dict[str, ModalityConfig]

Dictionary mapping modality names ("video", "state", "action", "language") to their configurations

reset

Reset the policy to its initial state.

def reset(self, options: dict[str, Any] | None = None) -> dict[str, Any]

options

dict[str, Any] | None

Optional reset parameters

info

dict[str, Any]

Information dictionary after reset (currently empty)

check_observation

Validate observation structure and types.

def check_observation(self, observation: dict[str, Any]) -> None

observation

dict[str, Any]

required

Observation to validate

This method raises AssertionError if validation fails. It checks:

All required modalities are present (video, state, language)
Data types match expectations (uint8 for video, float32 for state)
Shapes match the modality configuration
Temporal dimensions are consistent

check_action

Validate action structure and types.

def check_action(self, action: dict[str, Any]) -> None

action

dict[str, Any]

required

Action dictionary to validate

Usage example

from gr00t.policy.gr00t_policy import Gr00tPolicy
from gr00t.data.embodiment_tags import EmbodimentTag
import numpy as np

# Initialize policy
policy = Gr00tPolicy(
    embodiment_tag=EmbodimentTag.GR1,
    model_path="nvidia/GR00T-N1.6-3B",
    device="cuda:0",
    strict=True
)

# Get modality configuration
modality_configs = policy.get_modality_config()
video_config = modality_configs["video"]
state_config = modality_configs["state"]

# Prepare observation
observation = {
    "video": {
        "head_camera": np.zeros((1, 1, 224, 224, 3), dtype=np.uint8),
    },
    "state": {
        "joint_positions": np.zeros((1, 1, 14), dtype=np.float32),
    },
    "language": {
        "task": [["pick up the apple"]]
    }
}

# Generate action
action, info = policy.get_action(observation)
print(f"Action shape: {action['joint_positions'].shape}")
# Output: Action shape: (1, 8, 14)

Observation format

The policy expects observations in the following nested dictionary format:

observation = {
    "video": {
        "camera_name": np.ndarray[np.uint8, (B, T, H, W, C)],
        # ... more cameras
    },
    "state": {
        "state_name": np.ndarray[np.float32, (B, T, D)],
        # ... more states
    },
    "language": {
        "task": [["instruction string"]],  # Shape: (B, T)
    }
}

Where:

B: Batch size
T: Temporal horizon (number of frames/timesteps)
H, W: Image height and width
C: Number of channels (must be 3 for RGB)
D: State dimension

Action format

Actions are returned in a similar nested format:

action = {
    "action_name": np.ndarray[np.float32, (B, T, D)],
    # ... more actions
}

Where:

B: Batch size
T: Action horizon (number of future timesteps)
D: Action dimension

Properties

model

Gr00tN1d6

The loaded GR00T model in evaluation mode with bfloat16 precision

processor

BaseProcessor

The processor for input/output transformation

embodiment_tag

EmbodimentTag

The embodiment tag for this policy

modality_configs

dict[str, ModalityConfig]

Modality configurations for the current embodiment

collate_fn

Callable

Collation function for batching observations

language_key

str

The language modality key (currently only one is supported)

Gr00tSimPolicyWrapper

Wrapper for GR00T simulation environments

PolicyClient

Client for remote inference

Policy API guide

Complete guide to using the policy API

EmbodimentTag

Available embodiment tags

Policy

Data

Model

Training

Evaluation

Overview

Class definition

Constructor

Methods

get_action

get_modality_config

reset

check_observation

check_action

Usage example

Observation format

Action format

Properties

See also

Gr00tSimPolicyWrapper

PolicyClient

Policy API guide

EmbodimentTag

Build docs developers (and LLMs) love

Policy

Data

Model

Training

Evaluation

Documentation Index

​Overview

​Class definition

​Constructor

​Methods

​get_action

​get_modality_config

​reset

​check_observation

​check_action

​Usage example

​Observation format

​Action format

​Properties

​See also

Gr00tSimPolicyWrapper

PolicyClient

Policy API guide

EmbodimentTag

Build docs developers (and LLMs) love

Overview

Class definition

Constructor

Methods

get_action

get_modality_config

reset

check_observation

check_action

Usage example

Observation format

Action format

Properties

See also