gr00t/eval/rollout_policy.py) executes policies in simulation environments and measures task completion success rates.
Usage
Parameters
Name of the gymnasium environment to evaluate. Must be registered in one of:
- RoboCasa environments (prefix:
robocasa_panda_omron/,gr1_unified/) - SimplerEnv environments (prefix:
simpler_env_google/,simpler_env_widowx/) - LIBERO environments (prefix:
libero_sim/) - BEHAVIOR environments (prefix:
sim_behavior_r1_pro/) - GR00T LocoManip environments (prefix:
gr00tlocomanip_g1_sim/)
Path to the model checkpoint directory. Required if not using
policy-client-host.Example: checkpoints/checkpoint-5000Host address of the policy server. Use this with
policy-client-port instead of model-path to connect to a remote policy server.Port number of the policy server. Required when using
policy-client-host.Number of episodes to run for evaluation.
Number of parallel environments to run simultaneously. Automatically uses
AsyncVectorEnv for n-envs > 1.Number of action steps to execute from each policy prediction. This is the execution horizon.
Maximum number of steps per episode before truncation.
Outputs
The script outputs:- Success rate: Percentage of episodes that completed the task successfully
- Episode info: Additional metrics like task progress, episode lengths, and environment-specific scores
- Videos: Saved to
/tmp/sim_eval_videos_{model_name}_ac{n_action_steps}_{uuid}/(except for BEHAVIOR environments)
Example output
Supported environments
RoboCasa (GR1 and Panda)
SimplerEnv (Google Robot and WidowX)
LIBERO (Panda manipulation)
BEHAVIOR (R1 Pro humanoid)
GR00T LocoManipulation (G1)
Using policy server
For distributed evaluation, start a policy server and connect to it:Environment wrappers
The evaluation script automatically applies:MultiStepWrapper
Executes multiple action steps from each policy prediction:video_delta_indices: Controls temporal stacking of video observationsstate_delta_indices: Controls temporal stacking of state observationsn_action_steps: Number of actions to execute per inferencemax_episode_steps: Maximum steps before truncationterminate_on_success: Whether to end episode immediately on task success
VideoRecordingWrapper (optional)
Records videos of episodes:- Videos saved to
/tmp/sim_eval_videos_{model_name}_ac{n_action_steps}_{uuid}/ - Configurable FPS, codec, and quality settings
- Automatically disabled for BEHAVIOR environments
Embodiment detection
The script automatically determines the embodiment tag from the environment name prefix:| Environment Prefix | Embodiment Tag |
|---|---|
robocasa_panda_omron/ | ROBOCASA_PANDA_OMRON |
gr1_unified/, gr1/ | GR1 |
gr00tlocomanip_g1_sim/ | UNITREE_G1 |
simpler_env_google/ | OXE_GOOGLE |
simpler_env_widowx/ | OXE_WIDOWX |
libero_sim/ | LIBERO_PANDA |
sim_behavior_r1_pro/ | BEHAVIOR_R1_PRO |
For BEHAVIOR environments, video recording is automatically disabled to avoid conflicts with the simulator’s internal rendering.