Gr00tPolicy class to load and run inference with your trained model. After training, you’ll use this API to integrate your model with evaluation environments.
Loading the policy
Initialize a policy by providing the embodiment tag, model checkpoint path, and device:Parameters
model_path: Path to your trained model checkpoint directoryembodiment_tag: The embodiment tag you used during training (e.g.,EmbodimentTag.NEW_EMBODIMENT)device: Device to run inference on ("cuda:0","cpu", or integer device index)strict: Whether to validate inputs/outputs (recommended during development, can disable in production)
Understanding the observation format
The policy expects observations as a nested dictionary with three modalities:Dimensions
- B: Batch size (number of parallel environments)
- T: Temporal horizon (number of historical observations)
- H, W: Image height and width
- D: State dimension
- C: Number of channels (must be 3 for RGB)
Data type requirements
- Videos must be
np.uint8arrays with RGB pixel values in range [0, 255] - States must be
np.float32arrays - Language instructions are lists of lists of strings
The temporal horizon
T is determined by your model’s training configuration. Different modalities may have different temporal horizons (query via get_modality_config()).Understanding the action format
The policy returns actions in a similar nested structure:Dimensions
- B: Batch size (matches input batch size)
- T: Action horizon (number of future action steps to predict)
- D: Action dimension (e.g., 7 for arm joints, 1 for gripper)
Running inference
Use theget_action() method to compute actions from observations:
action: Dictionary of action arraysinfo: Dictionary of additional information (currently empty, reserved for future use)
Querying modality configurations
To understand what observations your policy expects and what actions it produces, query the modality configuration:- You’re unsure what observations your trained model expects
- You need to verify the temporal horizons for each modality
- You’re debugging observation/action format mismatches
Resetting the policy
Reset the policy between episodes:Currently, the policy is stateless, but calling
reset() is good practice for future compatibility.Adapting the policy to your environment
Most environments use different observation/action formats than the Policy API expects. You’ll typically need to write a policy wrapper that:- Transforms observations: Convert your environment’s observation format to the Policy API format
- Calls the policy: Use
policy.get_action()to compute actions - Transforms actions: Convert the policy’s actions back to your environment’s format
Example workflow
Server-client architecture for remote inference
For many use cases, especially when working with real robots or distributed systems, you may want to run the policy on a separate machine (e.g., a GPU server) and send observations/actions over the network.Why use server-client architecture?
- Separate compute resources: Run policy inference on a GPU server while controlling the robot from a different machine
- Dependency isolation: Avoid dependency issues with the client policy
Starting the policy server
Parameters
--embodiment-tag: The embodiment tag for your robot (e.g.,NEW_EMBODIMENT)--model-path: Path to your trained model checkpoint directory--device: Device to run inference on (cuda:0,cuda:1,cpu, etc.)--host: Host address (127.0.0.1for local only,0.0.0.0to accept external connections)--port: Port number (default: 5555)--strict: Enable input/output validation (default: True)
Using the policy client
On the client side, usePolicyClient to connect to the server:
PolicyClient implements the same BasePolicy interface, so it’s a drop-in replacement for Gr00tPolicy.
Common patterns
Batched inference
The policy supports batched inference for efficiency:Single environment inference
For single environments, use batch size of 1:Action chunking
When the action horizonT > 1, you can use action chunking:
Troubleshooting
- Enable strict mode during development:
strict=True - Print modality configs to understand expected formats
- Check shapes of your observations before calling
get_action() - Use the reference wrapper (
Gr00tSimPolicyWrapper) as a template - Validate incrementally: Test with dummy observations first before connecting to real environments