The discrete-action variant ofDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jackvice/RoboTerrain/llms.txt
Use this file to discover all available pages before exploring further.
RoverEnvFused lives in ros2_ws/src/sb3/environments/discrete_rover_env_fused.py. It exposes the same 3-channel fused visual observation and shared-memory interface as the continuous RoverEnvFused, but replaces the continuous Box action space with a flat Discrete(60) space — the Cartesian product of 5 speed levels and 12 compass headings. This design is intended for algorithms such as DQN, Rainbow, and other value-based methods that require a finite set of actions, and can simplify exploration in early training by constraining the agent to a small, semantically meaningful action vocabulary.
Constructor
RoverEnvFused. The only difference introduced in __init__ is the action-space definition and the action-decoding tables; see the Action Space section below.
Fused observation image dimensions
(height, width) in pixels. Must match the dimensions produced by the Active Vision pipeline.Maximum episode steps. Episodes terminate when
_step >= length, returning done=True.ROS2 topic for
sensor_msgs/LaserScan. No LIDAR subscriber is created; this parameter is present for API compatibility.ROS2 topic for
sensor_msgs/Imu. The subscriber is disabled; orientation is extracted from the /rover/pose_array quaternion instead.ROS2 topic for publishing
geometry_msgs/Twist velocity commands.Simulated world name:
'inspect', 'moon', or 'maze'. The string 'island' is internally remapped to 'moon' via if world_n == 'island': self.world_name = 'moon'.Seconds to wait for initial sensor data.
_check_robot_connection is currently disabled; if re-enabled it returns False after the timeout elapses without detecting sensor activity.Downsampled LIDAR resolution. Initialises the
lidar_data buffer size; no LIDAR subscriber is created.Maximum LIDAR range in metres used to size the
lidar_data buffer.Name of the POSIX shared memory segment written by the Active Vision inference pipeline. Must exist before the environment is instantiated; the constructor calls
exit(1) if the segment is not found.Action Space
The discrete variant defines a flatDiscrete space whose size is the product of n_speeds × n_directions:
Speed Levels
| Index | Speed (m/s) | Description |
|---|---|---|
| 0 | −0.2 | Slow reverse |
| 1 | 0.0 | Stop |
| 2 | 0.3 | Slow forward |
| 3 | 0.6 | Medium forward |
| 4 | 1.0 | Fast forward |
Direction Headings
Twelve angles are evenly distributed from −π to π (exclusive), spaced 30° apart:| Index | Angle (rad) | Approx. direction |
|---|---|---|
| 0 | −π (≈−3.14) | West |
| 1 | −2.62 | WSW |
| 2 | −2.09 | SSW |
| 3 | −1.57 | South |
| 4 | −1.05 | SSE |
| 5 | −0.52 | ESE |
| 6 | 0.00 | East |
| 7 | 0.52 | ENE |
| 8 | 1.05 | NNE |
| 9 | 1.57 | North |
| 10 | 2.09 | NNW |
| 11 | 2.62 | WNW |
Decoding an Action Integer
Insidestep(), an integer action is decoded into a (speed_idx, direction_idx) pair:
desired_heading is an absolute heading in radians (not a relative offset as in the continuous variant). It is tracked by the same PID heading controller used in RoverEnvFused, clipped to ±7.0 rad/s angular velocity output.
Example: Action Integer → Command
Observation Space
The observation space is identical to the continuousRoverEnvFused:
| Key | Shape | Description |
|---|---|---|
fused_image | (96, 96, 3) | 3-channel fused image from shared memory: [grayscale, YOLO-heatmap, depth] |
pose | (3,) | Ground-truth rover position [x, y, z] in metres |
imu | (3,) | Orientation [pitch, roll, yaw] in radians |
target | (2,) | [distance_m, relative_angle_rad] to the navigation goal |
velocities | (2,) | [linear_velocity, angular_velocity] from wheel odometry |
Stuck Detection
Like bothRoverEnv variants, this environment tracks position_history. The discrete environment inherits the fused-environment thresholds:
| Parameter | Value |
|---|---|
stuck_window | 5000 steps |
stuck_threshold | 0.0001 m |
stuck_penalty | −25.0 |
position_history holds 5000 entries, step() computes the displacement between the oldest and newest position. If it is below 0.0001 m the episode terminates immediately with the stuck penalty.
Differences from Continuous RoverEnvFused
- Action space type —
Discrete(60)instead ofBox([-0.6, -π], [1.0, π]). All downstream algorithms must handle integer actions rather than float arrays. - Action decoding — The continuous variant interprets
action[1]as a relative heading offset added to the current yaw. The discrete variant maps the direction index to an absolute heading from the pre-defined 12-point compass table. - Reverse capability — The continuous environment allows reverse down to −0.6 m/s; the discrete environment provides a single discrete reverse speed of −0.2 m/s.
- Granularity — The continuous action space has infinite resolution; the discrete space has 5 speed levels and 12 direction bins (30° resolution). Fine-grained manoeuvring between compass points is not directly representable.
- Action logging — The discrete
step()prints the raw action integer, its type, and dtype at every step for debugging (print(f"Action type: ...")) — a line not present in the continuous variant. - Reward function — Both variants share the same
task_reward(observation)implementation; the fused-image heatmap penalty (heatmap_center × 0.1) applies in both.
When to Use
Choose the discrete environment when:- You are using a value-based algorithm such as DQN, Rainbow, C51, or any method that requires a finite action set.
- You want to constrain exploration during early training by restricting the agent to a pre-defined action vocabulary rather than searching a 2D continuous space.
- You are performing behavioural cloning from human demonstrations where speed/direction pairs map naturally to button presses or a joystick quantised to compass directions.
- Your compute budget is tight and you want to benefit from action-space discretisation to reduce policy-gradient variance.
leo_rover_env_fused.py) when:
- You are using an actor-critic method such as PPO, SAC, or TD3 that operates natively on continuous actions.
- You need sub-30° directional precision for tight navigation corridors.
- You want the rover to modulate speed smoothly across the full
[-0.6, 1.0]range, for example when approaching a goal at low speed. - You are training with DreamerV3 or a world-model approach that benefits from a dense, smooth action manifold.
Both variants share the same observation space, reward function, shared-memory interface, and world-specific parameters. Switching between them requires only changing the imported class and the RL algorithm’s action-space handling.