Overview
Thetests/compare_baseline_adaptive.py script provides automated performance comparison between:
- Step 1: Baseline gait on flat terrain (upper bound)
- Step 2: Baseline gait on rough terrain (performance degradation)
- Step 3: Adaptive RL policy on rough terrain (learned recovery)
Quick Start
Ensure Model is Trained
Run Comparison Script
Execute the three-simulation comparison:Replace
20260304_143022 with your actual training run timestamp.Watch Simulations
Three MuJoCo viewer windows will open sequentially:
- Step 1: Baseline on flat terrain (smooth walking)
- Step 2: Baseline on rough terrain (struggling)
- Step 3: Adaptive policy on rough terrain (adapted walking)
Command-Line Options
Path to trained PPO model (
.zip file)Example: runs/adaptive_gait_20260304_143022/final_model.zipPath to VecNormalize statistics (
.pkl file)Example: runs/adaptive_gait_20260304_143022/vec_normalize.pklDuration for each simulation in seconds
- Longer runs = more stable statistics
- Shorter runs = faster iteration
- Recommended: 15-20 seconds
Output path for the comparison plotFormat: PNG image (150 DPI)
Understanding the Output
Three-Panel Plot
The script generates a side-by-side comparison:- Step 1: Baseline (Flat)
- Step 2: Baseline (Rough)
- Step 3: Adaptive (Rough)
Left Panel - Reference PerformanceShows baseline gait controller on ideal terrain:
- Smooth, linear progression
- Consistent forward velocity
- No obstacles or disturbances
Console Summary
After simulations complete, you’ll see:Interpreting Results
Good Learning Outcome
Good Learning Outcome
Indicators:✅ Policy successfully learned terrain adaptation
- Step 2 (Baseline Rough) shows significant degradation: -40% to -80%
- Step 3 (Adaptive Rough) shows large improvement over Step 2: +50% to +200%
- Step 3 approaches Step 1 performance: within 10-20% of flat terrain baseline
Marginal Learning
Marginal Learning
Indicators:⚠️ Policy learned some adaptation but not enoughSolutions:
- Step 3 shows small improvement over Step 2: +10% to +30%
- Still significantly below Step 1 performance: -30% to -50%
- Train longer (increase
total_timesteps) - Tune reward function
- Increase network size
- Adjust hyperparameters
No Learning / Regression
No Learning / Regression
Indicators:❌ Policy did not learn effectivelyLikely Causes:
- Step 3 similar to or worse than Step 2: -10% to +5%
- Far below Step 1 performance
- Training diverged or plateaued early
- Learning rate too high
- Observation space issues
- Reward function not aligned with task
Saved Files
The script saves several files in thetests/ directory:
Trajectory Data Format
Each JSON file contains:Total simulation time in seconds
Number of recorded positions
Terrain type:
"flat" or "rough"Control mode:
"baseline" or "adaptive"Array of position records:
time: Simulation time (seconds)x,y,z: Robot body position (meters)
Advanced Usage
Comparing Multiple Checkpoints
Test different training checkpoints:Longer Evaluation Runs
For more stable statistics:Batch Comparison Script
Compare all checkpoints automatically:batch_compare.sh
Troubleshooting
Simulation Crashes or Exits Early
Simulation Crashes or Exits Early
Symptoms: MuJoCo viewer closes before 17 secondsCauses:
- Robot fell and simulation terminated
- Episode length limit reached
- Model NaN values (diverged training)
- Check training metrics for divergence
- Try an earlier checkpoint
- Reduce simulation duration (
--seconds 10) - Review model loading errors in console
FileNotFoundError for Model
FileNotFoundError for Model
Error:
FileNotFoundError: [Errno 2] No such file or directory: 'runs/.../final_model.zip'Solutions:- List available runs:
- Use correct timestamp in path
- Ensure training completed successfully
- Check for
final_model.zipin run directory:
Plot Looks Strange
Plot Looks Strange
Symptoms: Backwards motion, flat lines, extreme valuesInterpretations:
- Backwards motion: Robot is falling/flipping
- Flat line: Robot stuck or not moving
- Extreme jumps: Simulation instability
- Watch the MuJoCo viewer during runs
- Check trajectory JSON files manually
- Run each simulation individually:
Adaptive Worse Than Baseline
Adaptive Worse Than Baseline
Symptoms: Step 3 shows negative improvement vs Step 2This indicates training failed. Check:
-
TensorBoard metrics:
- Did reward increase?
- Did episode length increase?
- When did training plateau?
- Try an earlier checkpoint (may have diverged)
-
Retrain with adjusted hyperparameters:
- Lower learning rate:
3e-4→1e-4 - Increase entropy coefficient:
0.01→0.05 - More environments:
84→128
- Lower learning rate:
Implementation Details
How It Works
The comparison script (tests/compare_baseline_adaptive.py):
Trajectory Recording
Theplay_adaptive_policy.py script records trajectory when --save-trajectory is provided:
Next Steps
Retrain with Tuning
Adjust hyperparameters based on comparison results
Deploy Best Model
Use the best-performing checkpoint in ROS2 setup