AdaptiveGaitEnv
A Gymnasium-compatible reinforcement learning environment that enables learning both high-level gait parameters and low-level residual corrections for quadruped locomotion on rough terrain.Overview
TheAdaptiveGaitEnv extends the residual learning approach by allowing the policy to dynamically adjust gait parameters during execution. The agent learns to modulate:
- High-level gait parameters: step height, step length, cycle time, and body height
- Low-level residual corrections: per-leg foot position offsets (3D per leg)
Key Features
- 69-dimensional observation space capturing full robot state
- 16-dimensional action space (4 parameter deltas + 12 residuals)
- Reward function balancing forward velocity, contact patterns, and stability
- Built on MuJoCo physics simulation
Class Definition
Constructor Parameters
Path to the MuJoCo XML model file defining the robot and environment.
Initial gait parameters for the controller. If None, uses controller defaults.
Scaling factor for residual corrections. Actions in [-1, 1] are multiplied by this value to get foot position offsets in meters.
Maximum number of simulation steps per episode before truncation.
Number of steps to run with zero residuals at episode start, allowing the robot to stabilize on terrain.
Random seed for reproducibility.
Spaces
Observation Space
Shape:(69,) - Box space with values in [-inf, inf]
The observation is a concatenation of the following components:
Robot body state in world frame:
- Position (3D): x, y, z coordinates in meters
- Orientation (4D): quaternion [w, x, y, z] (normalized)
- Linear velocity (3D): body velocity in m/s
- Angular velocity (3D): body angular velocity in rad/s
Joint positions and velocities for all 12 joints (3 per leg × 4 legs):
- Positions (12D): joint angles in radians
- Velocities (12D): joint angular velocities in rad/s
3D positions of each foot in body frame (3D per leg × 4 legs).
3D velocities of each foot in body frame (3D per leg × 4 legs).
Binary contact indicators for each leg (FL, FR, RL, RR):
1.0: foot in contact with ground0.0: foot not in contact
Current gait parameters normalized to [-1, 1] range:
- step_height: normalized current step height
- step_length: normalized current stride length
- cycle_time: normalized current gait cycle duration
- body_height: normalized current target body height
Action Space
Shape:(16,) - Box space with values in [-1.0, 1.0]
Actions are structured as follows:
Deltas to apply to gait parameters (indices 0-3):
- d_step_height [0]: change in step height, scaled by 0.005m
- d_step_length [1]: change in step length, scaled by 0.005m
- d_cycle_time [2]: change in cycle time, scaled by 0.05s
- d_body_height [3]: change in body height, scaled by 0.003m
3D foot position offsets for each leg (indices 4-15):
- FL residual [4:7]: Front-Left leg (x, y, z)
- FR residual [7:10]: Front-Right leg (x, y, z)
- RL residual [10:13]: Rear-Left leg (x, y, z)
- RR residual [13:16]: Rear-Right leg (x, y, z)
residual_scale to get offset in meters.Methods
reset
Optional seed for episode randomization.
Additional options:
randomize(bool): If True, applies small random perturbations to initial position and orientation
(observation, info)
observation: Initial 69D observation vectorinfo: Empty dict (metadata for future use)
step
16D action vector in range [-1, 1].
(observation, reward, terminated, truncated, info)
69D observation of new state.
Scalar reward value (see Reward Function section).
True if episode ended due to failure condition (robot fell over).
True if episode reached max_episode_steps.
Additional information:
reward_components(dict): Breakdown of reward by componentbody_height(float): Current body height in metersgait_params(dict): Current gait parameter values
Reward Function
The reward function is computed in_compute_reward() and consists of multiple components:
Forward Velocity Reward
Lateral Velocity Penalty
Contact Pattern Reward
Stability Penalty
Termination Conditions
Terminated (Failure)
Episode terminates early if:|roll| > π/3(60°) - robot tipped sideways|pitch| > π/3(60°) - robot tipped forward/backward
Truncated (Timeout)
Episode truncates whenstep_count >= max_episode_steps.
Usage Examples
Basic Training Setup
With Custom Gait Parameters
With Randomization for Robustness
Parameter Adaptation Scales
The environment defines scaling factors for parameter deltas:| Parameter | Scale | Range per Step |
|---|---|---|
| step_height | 0.005 | ±5mm |
| step_length | 0.005 | ±5mm |
| cycle_time | 0.05 | ±50ms |
| body_height | 0.003 | ±3mm |
Related
ResidualWalkEnv
Simpler environment with fixed gait parameters
GaitController
Base gait control system