play_adaptive_policy.py - Quadruped Robot with Adaptive Control

Overview

This script loads a trained adaptive gait policy and runs it in the MuJoCo viewer with real-time visualization of gait parameter adaptations. It supports both trained policies and baseline (pure gait controller without residuals) for comparison. Key Features:

Real-time visualization in MuJoCo viewer
Live gait parameter monitoring in console
Trajectory recording to JSON
Baseline comparison mode
Pause/resume simulation with spacebar
Fullscreen support for presentations

Basic Usage

Evaluate Trained Policy

python3 play_adaptive_policy.py \
    --model runs/adaptive_gait_20260304_143022/final_model.zip \
    --normalize runs/adaptive_gait_20260304_143022/vec_normalize.pkl \
    --seconds 30 \
    --deterministic

Run Baseline (No Residuals)

python3 play_adaptive_policy.py --baseline --seconds 30

Command-Line Arguments

Model Loading

--model

str

default:"None"

Path to trained model file (.zip). Required unless --baseline is set.Example: runs/adaptive_gait_20260304_143022/final_model.zip

--normalize

str

default:"None"

Path to VecNormalize statistics file (.pkl). Highly recommended for policies trained with observation normalization.Without this file, the policy will receive unnormalized observations and likely perform poorly.Example: runs/adaptive_gait_20260304_143022/vec_normalize.pkl

Simulation Control

--seconds

float

default:"30.0"

Duration to run simulation in seconds (wall-clock time).Example: --seconds 60 for 1 minute of playback

--deterministic

flag

Use deterministic actions (mean of policy distribution) instead of sampling. Recommended for evaluation to reduce stochasticity.

--no-reset

flag

Disable automatic environment reset on episode termination. Useful for visualization when you want to freeze on failure state.

Environment Configuration

--flat

flag

Use flat terrain (model/world.xml) instead of rough terrain (model/world_train.xml). Useful for testing policy on easier terrain.

--baseline

flag

Run baseline mode with zero actions (pure gait controller without residuals). Use this to compare against the learned policy.When set, --model is not required.

Visualization Options

--fullscreen

flag

Start viewer in fullscreen mode. Requires GLFW backend (MUJOCO_GL=glfw). Falls back to maximized window if fullscreen unavailable.Hides left and right UI panels for cleaner presentation view.

Data Recording

--save-trajectory

str

default:"None"

Save trajectory data (time, x, y, z position) to JSON file.Example: --save-trajectory outputs/trajectory.jsonOutput format:

{
  "mode": "adaptive",
  "duration": 30.5,
  "total_steps": 6100,
  "trajectory": [
    {"time": 0.0, "x": 0.0, "y": 0.0, "z": 0.08},
    {"time": 0.005, "x": 0.001, "y": 0.0, "z": 0.081},
    ...
  ]
}

Interactive Controls

Keyboard Shortcuts

Key	Action
Space	Pause/unpause simulation
ESC	Close viewer and exit

Standard MuJoCo viewer controls:

Mouse drag: Rotate camera
Right-click drag: Pan camera
Scroll wheel: Zoom in/out
Double-click: Select body for tracking

Console Output

Real-Time Gait Parameters

The script prints current gait parameters every 1 second:

[t=5.0s] Gait params: step_h=0.0520m, step_l=0.0680m, cycle_t=0.850s, body_h=0.0495m
[t=6.0s] Gait params: step_h=0.0532m, step_l=0.0665m, cycle_t=0.870s, body_h=0.0490m
[t=7.0s] Gait params: step_h=0.0548m, step_l=0.0652m, cycle_t=0.890s, body_h=0.0485m

Episode Termination

When the robot tips over or reaches max steps:

============================================================
[t=12.3s] EPISODE TERMINATED after 2460 steps
============================================================
  Orientation (pre-step):
    Roll:   -65.32° (limit: ±60°)
    Pitch:   12.45° (limit: ±60°)
    Yaw:     45.67°
  Termination reason: ROBOT TIPPED OVER
    → Roll exceeded limit (-65.3° > 60°)
  Body position: x=3.245m, y=0.123m, z=0.045m
  Linear velocity: x=0.234m/s, y=-0.012m/s, z=-0.089m/s
  Active gait params:
    step_height=0.0548m, step_length=0.0652m
    cycle_time=0.890s, body_height=0.0485m
============================================================

Final Statistics

At the end of playback:

================================================================================
Playback complete!
================================================================================
Total steps: 6000
Duration:    30.0s

Final Robot State:
----------------------------------------
Position:        x=8.456m, y=0.234m, z=0.052m
Linear velocity: x=0.312m/s, y=0.001m/s, z=0.002m/s
Orientation:     roll=2.34°, pitch=-1.23°, yaw=0.56°

Gait Parameter Statistics:
----------------------------------------
step_height    : mean=0.0489, std=0.0052, min=0.0420, max=0.0580
step_length    : mean=0.0663, std=0.0031, min=0.0610, max=0.0720
cycle_time     : mean=0.8520, std=0.0420, min=0.7800, max=0.9500
body_height    : mean=0.0498, std=0.0012, min=0.0475, max=0.0520

Interpretation:
  - Large std indicates adaptive behavior (good for rough terrain)
  - Small std indicates policy relies mostly on base parameters
  - Compare with base gait params to see adaptation magnitude

Example Workflows

1. Quick Policy Test (30 seconds)

python3 play_adaptive_policy.py \
    --model runs/adaptive_gait_20260304_143022/final_model.zip \
    --normalize runs/adaptive_gait_20260304_143022/vec_normalize.pkl \
    --seconds 30 \
    --deterministic

2. Compare Policy vs. Baseline

# Run trained policy
python3 play_adaptive_policy.py \
    --model runs/adaptive_gait_20260304_143022/final_model.zip \
    --normalize runs/adaptive_gait_20260304_143022/vec_normalize.pkl \
    --seconds 60 \
    --deterministic \
    --save-trajectory policy_traj.json

# Run baseline
python3 play_adaptive_policy.py \
    --baseline \
    --seconds 60 \
    --save-trajectory baseline_traj.json

# Compare trajectories
python3 compare_trajectories.py policy_traj.json baseline_traj.json

3. Test on Flat Terrain

python3 play_adaptive_policy.py \
    --model runs/adaptive_gait_20260304_143022/final_model.zip \
    --normalize runs/adaptive_gait_20260304_143022/vec_normalize.pkl \
    --flat \
    --seconds 60 \
    --deterministic

4. Fullscreen Demo for Presentation

MUJOCO_GL=glfw python3 play_adaptive_policy.py \
    --model runs/adaptive_gait_20260304_143022/final_model.zip \
    --normalize runs/adaptive_gait_20260304_143022/vec_normalize.pkl \
    --fullscreen \
    --seconds 120 \
    --deterministic

5. Record Long Trajectory

python3 play_adaptive_policy.py \
    --model runs/adaptive_gait_20260304_143022/final_model.zip \
    --normalize runs/adaptive_gait_20260304_143022/vec_normalize.pkl \
    --seconds 300 \
    --deterministic \
    --save-trajectory long_run.json

Interpreting Gait Adaptation

Step Height (`step_height`)

Baseline: ~0.040m
Expected adaptation: Increases on rough terrain (0.045-0.060m) to clear obstacles
High std (>0.005m): Policy actively modulating step height ✓
Low std (<0.002m): Policy not using adaptive behavior ✗

Step Length (`step_length`)

Baseline: ~0.060m
Expected adaptation: Decreases on rough terrain for stability, increases on flat terrain for speed
Typical range: 0.050-0.075m

Cycle Time (`cycle_time`)

Baseline: ~0.800s
Expected adaptation: Increases (slower gait) on difficult terrain for stability
Typical range: 0.700-1.000s

Body Height (`body_height`)

Baseline: ~0.050m
Expected adaptation: Small adjustments for stability and obstacle clearance
Typical range: 0.045-0.055m

Troubleshooting

Policy Performs Poorly

Problem: Robot tips over immediately or behaves erratically. Solutions:

Ensure --normalize points to the correct .pkl file from training
Check that model and normalization stats are from the same training run
Try --flat to test on easier terrain
Compare with --baseline to verify environment setup

Viewer Not Opening

Problem: Script runs but no window appears. Solutions:

Check DISPLAY environment variable is set (Linux)
Try different backend: MUJOCO_GL=glfw or MUJOCO_GL=osmesa
Verify MuJoCo installation: python -c "import mujoco; print(mujoco.__version__)"

Fullscreen Not Working

Problem: --fullscreen falls back to windowed mode. Solutions:

Set backend explicitly: MUJOCO_GL=glfw python3 play_adaptive_policy.py ...
Check if running in headless environment (fullscreen requires display)
Use maximized window fallback (automatic)

Output Files

Trajectory JSON

When using --save-trajectory, the output file contains:

{
  "mode": "adaptive",           // "adaptive" or "baseline"
  "duration": 30.5,              // Wall-clock seconds
  "total_steps": 6100,           // Simulation steps
  "trajectory": [                // Position samples
    {
      "time": 0.0,               // Elapsed time (seconds)
      "x": 0.0,                  // X position (meters)
      "y": 0.0,                  // Y position (meters)
      "z": 0.08                  // Z position (meters)
    },
    // ... one entry per simulation step
  ]
}

Use Cases for Trajectory Data

Distance traveled: Compute sqrt((x_final - x_start)^2 + (y_final - y_start)^2)
Average speed: Distance / duration
Path visualization: Plot (x, y) trajectory
Stability analysis: Check Z position variance
Policy comparison: Compare trajectories from different checkpoints

Controllers

Environments

Utilities

Training Scripts

​Overview

​Basic Usage

​Evaluate Trained Policy

​Run Baseline (No Residuals)

​Command-Line Arguments

​Model Loading

​Simulation Control

​Environment Configuration

​Visualization Options

​Data Recording

​Interactive Controls

​Keyboard Shortcuts

​Console Output

​Real-Time Gait Parameters

​Episode Termination

​Final Statistics

​Example Workflows

​1. Quick Policy Test (30 seconds)

​2. Compare Policy vs. Baseline

​3. Test on Flat Terrain

​4. Fullscreen Demo for Presentation

​5. Record Long Trajectory

​Interpreting Gait Adaptation

​Step Height (step_height)

​Step Length (step_length)

​Cycle Time (cycle_time)

​Body Height (body_height)

​Troubleshooting

​Policy Performs Poorly

​Viewer Not Opening

​Fullscreen Not Working

​Output Files

​Trajectory JSON

​Use Cases for Trajectory Data

​See Also

Build docs developers (and LLMs) love

Overview

Basic Usage

Evaluate Trained Policy

Run Baseline (No Residuals)

Command-Line Arguments

Model Loading

Simulation Control

Environment Configuration

Visualization Options

Data Recording

Interactive Controls

Keyboard Shortcuts

Console Output

Real-Time Gait Parameters

Episode Termination

Final Statistics

Example Workflows

1. Quick Policy Test (30 seconds)

2. Compare Policy vs. Baseline

3. Test on Flat Terrain

4. Fullscreen Demo for Presentation

5. Record Long Trajectory

Interpreting Gait Adaptation

Step Height (`step_height`)

Step Length (`step_length`)

Cycle Time (`cycle_time`)

Body Height (`body_height`)

Troubleshooting

Policy Performs Poorly

Viewer Not Opening

Fullscreen Not Working

Output Files

Trajectory JSON

Use Cases for Trajectory Data

See Also