AdaptiveGaitEnv

A Gymnasium-compatible reinforcement learning environment that enables learning both high-level gait parameters and low-level residual corrections for quadruped locomotion on rough terrain.

Overview

The AdaptiveGaitEnv extends the residual learning approach by allowing the policy to dynamically adjust gait parameters during execution. The agent learns to modulate:

High-level gait parameters: step height, step length, cycle time, and body height
Low-level residual corrections: per-leg foot position offsets (3D per leg)

This dual-level control enables sophisticated terrain adaptation, such as taking higher steps over obstacles, adjusting stride length for stability, or modulating gait frequency based on ground conditions.

Key Features

69-dimensional observation space capturing full robot state
16-dimensional action space (4 parameter deltas + 12 residuals)
Reward function balancing forward velocity, contact patterns, and stability
Built on MuJoCo physics simulation

Class Definition

from envs.adaptive_gait_env import AdaptiveGaitEnv

env = AdaptiveGaitEnv(
    model_path="model/world_train.xml",
    gait_params=None,
    residual_scale=0.01,
    max_episode_steps=60000,
    settle_steps=0,
    seed=None
)

Constructor Parameters

model_path

str

default:"model/world_train.xml"

Path to the MuJoCo XML model file defining the robot and environment.

gait_params

GaitParameters | None

default:"None"

Initial gait parameters for the controller. If None, uses controller defaults.

residual_scale

float

default:"0.01"

Scaling factor for residual corrections. Actions in [-1, 1] are multiplied by this value to get foot position offsets in meters.

max_episode_steps

int

default:"60000"

Maximum number of simulation steps per episode before truncation.

settle_steps

int

default:"0"

Number of steps to run with zero residuals at episode start, allowing the robot to stabilize on terrain.

seed

int | None

default:"None"

Random seed for reproducibility.

Spaces

Observation Space

Shape: (69,) - Box space with values in [-inf, inf] The observation is a concatenation of the following components:

body_state

float[13]

Robot body state in world frame:

Position (3D): x, y, z coordinates in meters
Orientation (4D): quaternion [w, x, y, z] (normalized)
Linear velocity (3D): body velocity in m/s
Angular velocity (3D): body angular velocity in rad/s

joint_states

float[24]

Joint positions and velocities for all 12 joints (3 per leg × 4 legs):

Positions (12D): joint angles in radians
Velocities (12D): joint angular velocities in rad/s

foot_positions

float[12]

3D positions of each foot in body frame (3D per leg × 4 legs).

foot_velocities

float[12]

3D velocities of each foot in body frame (3D per leg × 4 legs).

foot_contacts

float[4]

Binary contact indicators for each leg (FL, FR, RL, RR):

1.0: foot in contact with ground
0.0: foot not in contact

gait_parameters

float[4]

Current gait parameters normalized to [-1, 1] range:

step_height: normalized current step height
step_length: normalized current stride length
cycle_time: normalized current gait cycle duration
body_height: normalized current target body height

Action Space

Shape: (16,) - Box space with values in [-1.0, 1.0] Actions are structured as follows:

parameter_deltas

float[4]

Deltas to apply to gait parameters (indices 0-3):

d_step_height [0]: change in step height, scaled by 0.005m
d_step_length [1]: change in step length, scaled by 0.005m
d_cycle_time [2]: change in cycle time, scaled by 0.05s
d_body_height [3]: change in body height, scaled by 0.003m

Each action value in [-1, 1] is multiplied by its corresponding scale to produce the actual parameter change.

residual_corrections

float[12]

3D foot position offsets for each leg (indices 4-15):

FL residual [4:7]: Front-Left leg (x, y, z)
FR residual [7:10]: Front-Right leg (x, y, z)
RL residual [10:13]: Rear-Left leg (x, y, z)
RR residual [13:16]: Rear-Right leg (x, y, z)

Each value is multiplied by residual_scale to get offset in meters.

Methods

reset

def reset(
    self,
    *,
    seed: int | None = None,
    options: dict | None = None
) -> tuple[np.ndarray, dict]

Resets the environment to initial state.

seed

int | None

Optional seed for episode randomization.

options

dict | None

Additional options:

randomize (bool): If True, applies small random perturbations to initial position and orientation

Returns: (observation, info)

observation: Initial 69D observation vector
info: Empty dict (metadata for future use)

step

def step(
    self,
    action: np.ndarray
) -> tuple[np.ndarray, float, bool, bool, dict]

Executes one simulation step with the given action.

action

np.ndarray

16D action vector in range [-1, 1].

Returns: (observation, reward, terminated, truncated, info)

observation

np.ndarray

69D observation of new state.

reward

float

Scalar reward value (see Reward Function section).

terminated

bool

True if episode ended due to failure condition (robot fell over).

truncated

bool

True if episode reached max_episode_steps.

info

dict

Additional information:

reward_components (dict): Breakdown of reward by component
body_height (float): Current body height in meters
gait_params (dict): Current gait parameter values

Reward Function

The reward function is computed in _compute_reward() and consists of multiple components:

Forward Velocity Reward

reward = 2000.0 * forward_vel

Primary objective: maximize forward velocity along the x-axis. High weight encourages fast locomotion.

Lateral Velocity Penalty

penalty = -2.0 * abs(lateral_vel)

Penalizes sideways drift to maintain straight-line walking.

Contact Pattern Reward

# For each leg:
if is_swing and has_contact:      # Swing leg touching (BAD)
    contact_reward -= 0.5
elif is_swing and not has_contact:  # Swing leg in air (GOOD)
    contact_reward += 0.2
elif not is_swing and has_contact:  # Stance leg on ground (GOOD)
    contact_reward += 0.2
else:                               # Stance leg floating (BAD)
    contact_reward -= 0.5

Encourages proper gait execution by rewarding correct contact patterns (swing legs in air, stance legs on ground).

Stability Penalty

tilt_penalty = -2.0 * (roll² + pitch² + yaw²)

Penalizes body orientation deviations from upright to maintain stability. Total Reward: Sum of all components

Termination Conditions

Terminated (Failure)

Episode terminates early if:

|roll| > π/3 (60°) - robot tipped sideways
|pitch| > π/3 (60°) - robot tipped forward/backward

Truncated (Timeout)

Episode truncates when step_count >= max_episode_steps.

Usage Examples

Basic Training Setup

import numpy as np
from envs.adaptive_gait_env import AdaptiveGaitEnv

# Create environment
env = AdaptiveGaitEnv(
    model_path="model/world_train.xml",
    residual_scale=0.01,
    max_episode_steps=10000,
    settle_steps=100
)

# Reset and run episode
obs, info = env.reset(seed=42)
total_reward = 0.0

for _ in range(1000):
    # Random policy (replace with trained policy)
    action = env.action_space.sample()
    
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    
    if terminated or truncated:
        print(f"Episode finished with reward: {total_reward}")
        print(f"Reward breakdown: {info['reward_components']}")
        break

With Custom Gait Parameters

from gait_controller import GaitParameters
from envs.adaptive_gait_env import AdaptiveGaitEnv

# Define custom initial gait
custom_gait = GaitParameters(
    step_height=0.04,
    step_length=0.08,
    cycle_time=0.6,
    body_height=0.06
)

env = AdaptiveGaitEnv(
    gait_params=custom_gait,
    residual_scale=0.015
)

obs, info = env.reset()

With Randomization for Robustness

env = AdaptiveGaitEnv()

# Reset with randomized initial conditions
obs, info = env.reset(
    seed=123,
    options={"randomize": True}
)

Parameter Adaptation Scales

The environment defines scaling factors for parameter deltas:

Parameter	Scale	Range per Step
step_height	0.005	±5mm
step_length	0.005	±5mm
cycle_time	0.05	±50ms
body_height	0.003	±3mm

These scales control the rate at which the policy can adapt gait parameters. Values are tuned to allow meaningful adaptation while preventing unstable rapid changes.

ResidualWalkEnv

Simpler environment with fixed gait parameters

GaitController

Base gait control system

Controllers

Environments

Utilities

Training Scripts

AdaptiveGaitEnv

AdaptiveGaitEnv

Overview

Class Definition

Constructor Parameters

Spaces

Observation Space

Action Space

Methods

reset

step

Reward Function

Forward Velocity Reward

Lateral Velocity Penalty

Contact Pattern Reward

Stability Penalty

Termination Conditions

Terminated (Failure)

Truncated (Timeout)

Usage Examples

Basic Training Setup

With Custom Gait Parameters

With Randomization for Robustness

Parameter Adaptation Scales

ResidualWalkEnv

GaitController

Build docs developers (and LLMs) love

Controllers

Environments

Utilities

Training Scripts

​AdaptiveGaitEnv

​Overview

​Class Definition

​Constructor Parameters

​Spaces

​Observation Space

​Action Space

​Methods

​reset

​step

​Reward Function

​Forward Velocity Reward

​Lateral Velocity Penalty

​Contact Pattern Reward

​Stability Penalty

​Termination Conditions

​Terminated (Failure)

​Truncated (Timeout)

​Usage Examples

​Basic Training Setup

​With Custom Gait Parameters

​With Randomization for Robustness

​Parameter Adaptation Scales

​Related

ResidualWalkEnv

GaitController

Build docs developers (and LLMs) love

AdaptiveGaitEnv

Overview

Class Definition

Constructor Parameters

Spaces

Observation Space

Action Space

Methods

reset

step

Reward Function

Forward Velocity Reward

Lateral Velocity Penalty

Contact Pattern Reward

Stability Penalty

Termination Conditions

Terminated (Failure)

Truncated (Timeout)

Usage Examples

Basic Training Setup

With Custom Gait Parameters

With Randomization for Robustness

Parameter Adaptation Scales

Related