Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jackvice/RoboTerrain/llms.txt

Use this file to discover all available pages before exploring further.

The discrete-action variant of RoverEnvFused lives in ros2_ws/src/sb3/environments/discrete_rover_env_fused.py. It exposes the same 3-channel fused visual observation and shared-memory interface as the continuous RoverEnvFused, but replaces the continuous Box action space with a flat Discrete(60) space — the Cartesian product of 5 speed levels and 12 compass headings. This design is intended for algorithms such as DQN, Rainbow, and other value-based methods that require a finite set of actions, and can simplify exploration in early training by constraining the agent to a small, semantically meaningful action vocabulary.

Constructor

RoverEnvFused(   # discrete variant from discrete_rover_env_fused.py
    size=(96, 96),
    length=6000,
    scan_topic='/scan',
    imu_topic='/imu/data',
    cmd_vel_topic='/cmd_vel',
    world_n='inspect',
    connection_check_timeout=30,
    lidar_points=32,
    max_lidar_range=12.0,
    rl_obs_name='rl_observation',
)
The constructor signature and all parameters are identical to the continuous RoverEnvFused. The only difference introduced in __init__ is the action-space definition and the action-decoding tables; see the Action Space section below.
size
tuple
default:"(96, 96)"
Fused observation image dimensions (height, width) in pixels. Must match the dimensions produced by the Active Vision pipeline.
length
int
default:"6000"
Maximum episode steps. Episodes terminate when _step >= length, returning done=True.
scan_topic
str
default:"'/scan'"
ROS2 topic for sensor_msgs/LaserScan. No LIDAR subscriber is created; this parameter is present for API compatibility.
imu_topic
str
default:"'/imu/data'"
ROS2 topic for sensor_msgs/Imu. The subscriber is disabled; orientation is extracted from the /rover/pose_array quaternion instead.
cmd_vel_topic
str
default:"'/cmd_vel'"
ROS2 topic for publishing geometry_msgs/Twist velocity commands.
world_n
str
default:"'inspect'"
Simulated world name: 'inspect', 'moon', or 'maze'. The string 'island' is internally remapped to 'moon' via if world_n == 'island': self.world_name = 'moon'.
connection_check_timeout
int
default:"30"
Seconds to wait for initial sensor data. _check_robot_connection is currently disabled; if re-enabled it returns False after the timeout elapses without detecting sensor activity.
lidar_points
int
default:"32"
Downsampled LIDAR resolution. Initialises the lidar_data buffer size; no LIDAR subscriber is created.
max_lidar_range
float
default:"12.0"
Maximum LIDAR range in metres used to size the lidar_data buffer.
rl_obs_name
str
default:"'rl_observation'"
Name of the POSIX shared memory segment written by the Active Vision inference pipeline. Must exist before the environment is instantiated; the constructor calls exit(1) if the segment is not found.

Action Space

The discrete variant defines a flat Discrete space whose size is the product of n_speeds × n_directions:
# Speed levels (m/s)
self.n_speeds = 5
self.speed_levels = np.array([-0.2, 0.0, 0.3, 0.6, 1.0], dtype=np.float32)

# Direction angles (radians) — 12 evenly-spaced headings covering [-π, π)
self.n_directions = 12
self.direction_angles = np.linspace(-np.pi, np.pi, 12, endpoint=False)

# Combined flat action space: 5 × 12 = 60 discrete actions
self.action_space = spaces.Discrete(self.n_speeds * self.n_directions)  # Discrete(60)

Speed Levels

IndexSpeed (m/s)Description
0−0.2Slow reverse
10.0Stop
20.3Slow forward
30.6Medium forward
41.0Fast forward

Direction Headings

Twelve angles are evenly distributed from −π to π (exclusive), spaced 30° apart:
IndexAngle (rad)Approx. direction
0−π (≈−3.14)West
1−2.62WSW
2−2.09SSW
3−1.57South
4−1.05SSE
5−0.52ESE
60.00East
70.52ENE
81.05NNE
91.57North
102.09NNW
112.62WNW

Decoding an Action Integer

Inside step(), an integer action is decoded into a (speed_idx, direction_idx) pair:
action        = int(action)
speed_idx     = action // self.n_directions   # integer division by 12
direction_idx = action % self.n_directions    # remainder

speed           = float(self.speed_levels[speed_idx])
desired_heading = float(self.direction_angles[direction_idx])
The decoded desired_heading is an absolute heading in radians (not a relative offset as in the continuous variant). It is tracked by the same PID heading controller used in RoverEnvFused, clipped to ±7.0 rad/s angular velocity output.

Example: Action Integer → Command

# Action 27 → speed_idx=2, direction_idx=3
action        = 27
speed_idx     = 27 // 12  # = 2  → 0.3 m/s (slow forward)
direction_idx = 27 % 12   # = 3  → -1.57 rad (South)

Observation Space

The observation space is identical to the continuous RoverEnvFused:
spaces.Dict({
    'fused_image': spaces.Box(
        low=0.0,
        high=1.0,
        shape=(96, 96, 3),
        dtype=np.float32
    ),
    'pose': spaces.Box(
        low=np.array([-30.0, -30.0, -10.0]),
        high=np.array([ 30.0,  30.0,  10.0]),
        dtype=np.float32
    ),
    'imu': spaces.Box(
        low=np.array([-np.pi, -np.pi, -np.pi]),
        high=np.array([ np.pi,  np.pi,  np.pi]),
        dtype=np.float32
    ),
    'target': spaces.Box(
        low=np.array([0,    -np.pi]),
        high=np.array([100,  np.pi]),
        shape=(2,),
        dtype=np.float32
    ),
    'velocities': spaces.Box(
        low=np.array([-10.0, -10.0]),
        high=np.array([ 10.0,  10.0]),
        shape=(2,),
        dtype=np.float32
    ),
})
KeyShapeDescription
fused_image(96, 96, 3)3-channel fused image from shared memory: [grayscale, YOLO-heatmap, depth]
pose(3,)Ground-truth rover position [x, y, z] in metres
imu(3,)Orientation [pitch, roll, yaw] in radians
target(2,)[distance_m, relative_angle_rad] to the navigation goal
velocities(2,)[linear_velocity, angular_velocity] from wheel odometry

Stuck Detection

Like both RoverEnv variants, this environment tracks position_history. The discrete environment inherits the fused-environment thresholds:
ParameterValue
stuck_window5000 steps
stuck_threshold0.0001 m
stuck_penalty−25.0
Once position_history holds 5000 entries, step() computes the displacement between the oldest and newest position. If it is below 0.0001 m the episode terminates immediately with the stuck penalty.

Differences from Continuous RoverEnvFused

  • Action space typeDiscrete(60) instead of Box([-0.6, -π], [1.0, π]). All downstream algorithms must handle integer actions rather than float arrays.
  • Action decoding — The continuous variant interprets action[1] as a relative heading offset added to the current yaw. The discrete variant maps the direction index to an absolute heading from the pre-defined 12-point compass table.
  • Reverse capability — The continuous environment allows reverse down to −0.6 m/s; the discrete environment provides a single discrete reverse speed of −0.2 m/s.
  • Granularity — The continuous action space has infinite resolution; the discrete space has 5 speed levels and 12 direction bins (30° resolution). Fine-grained manoeuvring between compass points is not directly representable.
  • Action logging — The discrete step() prints the raw action integer, its type, and dtype at every step for debugging (print(f"Action type: ...")) — a line not present in the continuous variant.
  • Reward function — Both variants share the same task_reward(observation) implementation; the fused-image heatmap penalty (heatmap_center × 0.1) applies in both.

When to Use

Choose the discrete environment when:
  • You are using a value-based algorithm such as DQN, Rainbow, C51, or any method that requires a finite action set.
  • You want to constrain exploration during early training by restricting the agent to a pre-defined action vocabulary rather than searching a 2D continuous space.
  • You are performing behavioural cloning from human demonstrations where speed/direction pairs map naturally to button presses or a joystick quantised to compass directions.
  • Your compute budget is tight and you want to benefit from action-space discretisation to reduce policy-gradient variance.
Choose the continuous environment (leo_rover_env_fused.py) when:
  • You are using an actor-critic method such as PPO, SAC, or TD3 that operates natively on continuous actions.
  • You need sub-30° directional precision for tight navigation corridors.
  • You want the rover to modulate speed smoothly across the full [-0.6, 1.0] range, for example when approaching a goal at low speed.
  • You are training with DreamerV3 or a world-model approach that benefits from a dense, smooth action manifold.
Both variants share the same observation space, reward function, shared-memory interface, and world-specific parameters. Switching between them requires only changing the imported class and the RL algorithm’s action-space handling.

Build docs developers (and LLMs) love