ResidualWalkEnv
A Gymnasium-compatible reinforcement learning environment for learning low-level residual corrections on top of a fixed diagonal Bézier gait controller. This environment focuses purely on foot position adjustments without modifying high-level gait parameters.Overview
TheResidualWalkEnv provides a simpler learning problem than AdaptiveGaitEnv by keeping gait parameters (step height, cycle time, etc.) fixed. The policy learns only per-leg residual corrections—small 3D foot position offsets that compensate for terrain irregularities and improve locomotion quality.
Key Features
- 65-dimensional observation space (no gait parameter feedback)
- 12-dimensional action space (3D residual per leg)
- Fixed gait parameters for consistent base behavior
- Reward optimized for velocity tracking and contact quality
Class Definition
Constructor Parameters
Path to the MuJoCo XML model file defining the robot and environment.
Fixed gait parameters for the base controller. If None, uses controller defaults. These parameters remain constant throughout training.
Scaling factor for residual corrections. Actions in [-1, 1] are multiplied by this value to get foot position offsets in meters.
Maximum number of simulation steps per episode before truncation.
Number of steps to run with zero residuals at episode start. This allows the robot to settle into a stable gait on the terrain before learning begins.
Random seed for reproducibility.
Spaces
Observation Space
Shape:(65,) - Box space with values in [-inf, inf]
The observation is a concatenation of the following components:
Robot body state in world frame:
- Position (3D): x, y, z coordinates in meters
- Orientation (4D): quaternion [w, x, y, z] (normalized)
- Linear velocity (3D): body velocity in m/s
- Angular velocity (3D): body angular velocity in rad/s
Joint positions and velocities for all 12 joints (3 per leg × 4 legs):
- Positions (12D): joint angles in radians
- Velocities (12D): joint angular velocities in rad/s
3D positions of each foot in body frame (3D per leg × 4 legs).
3D velocities of each foot in body frame (3D per leg × 4 legs).
Binary contact indicators for each leg (FL, FR, RL, RR):
1.0: foot in contact with ground0.0: foot not in contact
Unlike
AdaptiveGaitEnv, this observation does NOT include current gait parameters since they are fixed.Action Space
Shape:(12,) - Box space with values in [-1.0, 1.0]
Actions represent 3D foot position residual corrections:
3D foot position offsets for each leg:
- FL residual [0:3]: Front-Left leg (x, y, z)
- FR residual [3:6]: Front-Right leg (x, y, z)
- RL residual [6:9]: Rear-Left leg (x, y, z)
- RR residual [9:12]: Rear-Right leg (x, y, z)
residual_scale to get the actual offset in meters.Methods
reset
Optional seed for episode randomization.
Additional options:
randomize(bool): If True, applies small random perturbations to initial position and orientation
(observation, info)
observation: Initial 65D observation vectorinfo: Empty dict (metadata for future use)
During reset, the environment runs
settle_steps steps with zero residuals to allow the robot to stabilize on the terrain before the episode begins.step
12D action vector in range [-1, 1].
(observation, reward, terminated, truncated, info)
65D observation of new state.
Scalar reward value (see Reward Function section).
True if episode ended due to failure condition (robot fell over).
True if episode reached max_episode_steps.
Additional information:
reward_components(dict): Breakdown of reward by componentbody_height(float): Current body height in meters
set_terrain_scale
Terrain difficulty scale (future use).
Reward Function
The reward function is computed in_compute_reward() and focuses on velocity tracking and gait quality:
Forward Velocity Tracking
Contact Pattern Reward
Stability Penalty
Lateral Stability
Termination Conditions
Terminated (Failure)
Episode terminates early if:|roll| > π/3(60°) - robot tipped sideways|pitch| > π/3(60°) - robot tipped forward/backward
Truncated (Timeout)
Episode truncates whenstep_count >= max_episode_steps.
Usage Examples
Basic Training Setup
With Custom Gait Parameters
Training with PPO
With Randomization
Expected Velocity Calculation
The environment automatically computes expected forward velocity from gait parameters:step_length=0.08m and cycle_time=0.6s, the expected velocity is 0.133 m/s.
Comparison with AdaptiveGaitEnv
| Feature | ResidualWalkEnv | AdaptiveGaitEnv |
|---|---|---|
| Observation dim | 65 | 69 |
| Action dim | 12 | 16 |
| Gait parameters | Fixed | Adaptive |
| Learning complexity | Lower | Higher |
| Terrain adaptation | Limited | Full |
| Use case | Baseline, simple terrains | Complex terrains, advanced control |
Related
AdaptiveGaitEnv
Extended environment with adaptive gait parameters
BezierGaitController
Base Bézier gait controller used in this environment