RoverEnvFused Vision-Fused Gymnasium Environment API

RoverEnvFused is a Gymnasium-compatible environment that replaces the raw LIDAR sensor stack of RoverEnv with a rich 3-channel fused image observation delivered via POSIX shared memory. Each 96×96 frame is composed of three co-registered visual channels: a grayscale camera view, a YOLO pedestrian-detection mask, and a monocular depth map. This design lets the environment consume pre-processed, inference-ready observations from a separate Active Vision pipeline process without the latency of inter-process ROS2 messaging. The continuous-action variant is implemented in ros2_ws/src/sb3/environments/leo_rover_env_fused.py.

Constructor

RoverEnvFused(
    size=(96, 96),
    length=6000,
    scan_topic='/scan',
    imu_topic='/imu/data',
    cmd_vel_topic='/cmd_vel',
    world_n='inspect',
    connection_check_timeout=30,
    lidar_points=32,
    max_lidar_range=12.0,
    rl_obs_name='rl_observation',
)

size

tuple

default:"(96, 96)"

Height and width of the fused observation image in pixels. Internally stored as rl_obs_height = 96 and rl_obs_width = 96. Must match the dimensions written by the Active Vision inference pipeline into shared memory.

length

int

default:"6000"

Maximum number of environment steps per episode — three times longer than RoverEnv’s default of 2000. The larger budget accommodates the richer observation space and slower per-step wall-clock time from image-based policies.

scan_topic

str

default:"'/scan'"

ROS2 topic name for sensor_msgs/LaserScan messages. No LIDAR subscriber is created in RoverEnvFused; this parameter is accepted for API compatibility but has no effect on the running environment.

imu_topic

str

default:"'/imu/data'"

ROS2 topic name for sensor_msgs/Imu messages. The IMU subscriber is disabled by default (commented out in source); orientation is instead extracted from the /rover/pose_array quaternion inside pose_array_callback.

cmd_vel_topic

str

default:"'/cmd_vel'"

ROS2 topic for publishing geometry_msgs/Twist velocity commands. The heading PID output and the agent’s requested speed are combined into a single Twist message per step() call.

world_n

str

default:"'inspect'"

Simulated world name. Accepted values: 'inspect', 'moon', 'maze'. The string 'island' is silently remapped to 'moon' at construction time via if world_n == 'island': self.world_name = 'moon'. The world name controls spawn ranges, goal ranges, and out-of-bounds thresholds identically to RoverEnv.

connection_check_timeout

int

default:"30"

Seconds to wait for initial sensor data when verifying the robot connection. The robot connection check is currently disabled in RoverEnvFused; this parameter is reserved for future use. If re-enabled, _check_robot_connection returns False after the timeout elapses.

lidar_points

int

default:"32"

Downsampled LIDAR resolution. The lidar_data buffer is initialised but no LIDAR subscriber is created; this parameter is present for API compatibility.

max_lidar_range

float

default:"12.0"

Maximum valid LIDAR range in metres. Used to size the lidar_data buffer.

rl_obs_name

str

default:"'rl_observation'"

Name of the POSIX shared memory segment written by the Active Vision inference pipeline. The environment attaches to this segment at construction time using multiprocessing.shared_memory.SharedMemory. If the segment does not exist the process calls exit(1) with a descriptive error.

Passing world_n='island' is equivalent to world_n='moon'. The mapping if world_n == 'island': self.world_name = 'moon' is applied unconditionally in __init__, so all internal paths, pose-service endpoints, and spawn ranges use 'moon'.

Observation Space

The observation is a gymnasium.spaces.Dict with five keys. The primary key is fused_image, which carries the 3-channel visual observation sourced from shared memory:

spaces.Dict({
    'fused_image': spaces.Box(
        low=0.0,
        high=1.0,
        shape=(96, 96, 3),   # (H, W, channels)
        dtype=np.float32
    ),
    'pose': spaces.Box(
        low=np.array([-30.0, -30.0, -10.0]),
        high=np.array([ 30.0,  30.0,  10.0]),
        dtype=np.float32
    ),
    'imu': spaces.Box(
        low=np.array([-np.pi, -np.pi, -np.pi]),
        high=np.array([ np.pi,  np.pi,  np.pi]),
        dtype=np.float32
    ),
    'target': spaces.Box(
        low=np.array([0,    -np.pi]),
        high=np.array([100,  np.pi]),
        shape=(2,),
        dtype=np.float32
    ),
    'velocities': spaces.Box(
        low=np.array([-10.0, -10.0]),
        high=np.array([ 10.0,  10.0]),
        shape=(2,),
        dtype=np.float32
    ),
})

Fused Image Channels

Channel Index	Content	Source
`[:, :, 0]`	Grayscale camera image	RGB camera → greyscale conversion
`[:, :, 1]`	Pedestrian detection mask (heatmap)	YOLO inference output
`[:, :, 2]`	Monocular depth map	Depth estimation model

All channel values are normalised to [0.0, 1.0] as float32 by the inference pipeline before being written to shared memory.

Auxiliary Observation Keys

Key	Shape	Description
`pose`	`(3,)`	Ground-truth rover position `[x, y, z]` from `/rover/pose_array`.
`imu`	`(3,)`	Orientation `[pitch, roll, yaw]` in radians extracted from the pose quaternion.
`target`	`(2,)`	`[distance_m, relative_angle_rad]` to the current navigation goal.
`velocities`	`(2,)`	`[linear_velocity, angular_velocity]` from `/odometry/wheels`.

Shared Memory Interface

RoverEnvFused reads its primary observation from a POSIX shared memory segment rather than a ROS2 topic. This approach eliminates serialisation overhead and allows the inference pipeline to run in a separate process (potentially on a GPU) at its own frame rate.

Memory Layout

Offset 0      : float64  — Unix timestamp of the most-recent frame (8 bytes)
Offset 8      : uint32   — Valid flag; 1 = frame is ready to read (4 bytes)
Offset 12     : float32[] — Flattened (H × W × 3) observation array
                            96 × 96 × 3 × 4 bytes = 110,592 bytes

Read Protocol

get_fused_observation() polls the shared memory in a busy-wait loop with 1 ms sleep intervals. It compares the current timestamp at offset 0 against the last-seen timestamp. When both the timestamp has changed and the valid flag equals 1, it reads the observation array, reshapes it to (96, 96, 3), and returns a copy.

def get_fused_observation(self) -> np.ndarray:
    """Block until a new frame arrives, then return the fused observation."""
    buf = self.rl_obs_shm.buf
    header_size = 12        # 8-byte timestamp + 4-byte valid flag
    data_size   = 96 * 96 * 3 * 4  # float32 elements

    while True:
        ts         = struct.unpack_from('<d', buf, 0)[0]
        valid_flag = struct.unpack_from('<L', buf, 8)[0]

        if ts != self._last_obs_ts and valid_flag == 1:
            self._last_obs_ts = ts
            raw = bytes(buf[header_size : header_size + data_size])
            obs = np.frombuffer(raw, dtype=np.float32).reshape(96, 96, 3)
            return obs.copy()

        time.sleep(0.001)

The Active Vision inference pipeline must be running and actively writing frames to the shared memory segment before RoverEnvFused is instantiated. If the named segment (rl_obs_name) does not exist the constructor calls exit(1). If the pipeline stalls mid-episode, get_fused_observation() will block indefinitely until a new valid frame appears.

Attaching at Construction

self.rl_obs_shm = shared_memory.SharedMemory(name=self.rl_obs_name)

The segment is opened in attach-only mode; RoverEnvFused never creates or destroys the segment. The resource-tracker entry is unregistered immediately after attach to suppress Python’s shutdown warning about inherited shared memory handles.

Action Space

The action space uses a continuous Box with a minimum linear speed of −0.6 m/s (limiting the rover’s reverse capability compared to RoverEnv’s −1.0):

spaces.Box(
    low=np.array([-0.6, -np.pi]),
    high=np.array([ 1.0,  np.pi]),
    dtype=np.float32
)

Index	Meaning	Range
`action[0]`	Target linear speed (m/s)	`[-0.6, 1.0]`
`action[1]`	Desired heading offset (rad)	`[-π, π]`

Methods

`step(action) -> tuple`

observation, reward, done, truncated, info = env.step(action)

Executes one environment step:

Checks for flip, stuck-in-collision, stuck, and out-of-bounds conditions with associated terminal penalties.
Decodes action into speed and relative_heading_command.
Computes absolute target heading (current_yaw + relative_heading_command) and calls heading_controller() to get angular velocity.
Publishes a Twist command.
Calls get_observation(), which blocks on get_fused_observation() until a fresh shared-memory frame is available.
Calls task_reward(observation) — passes the full observation dict so the reward can inspect the pedestrian heatmap channel.
Spins the ROS2 node once (timeout_sec=0.001) to flush callbacks.

info dict keys: steps, total_steps, reward.

`reset(seed=None, options=None) -> tuple`

observation, info = env.reset()

Resets the episode to a randomised initial state: stops the rover, teleports it via the Ignition Gazebo ign service CLI, randomises the goal position, spins the ROS2 node for 100 × 0.1 s to let the pose settle, sleeps 1 s, then returns the initial observation (which includes a fresh fused image read from shared memory).

`get_observation() -> dict`

Returns the current observation dictionary:

{
    'fused_image': self.get_fused_observation(),         # np.ndarray (96, 96, 3)
    'pose':        self.rover_position,                  # np.ndarray (3,)
    'imu':         [pitch, roll, yaw],                   # np.ndarray (3,)
    'target':      self.get_target_info(),               # np.ndarray (2,)
    'velocities':  [linear_velocity, angular_velocity],  # np.ndarray (2,)
}

`get_target_info() -> np.ndarray`

Computes and returns [distance, relative_angle] to the current goal. Distance is Euclidean in metres; angle is relative to the rover’s current yaw, normalised to [-π, π].

`task_reward(observation) -> float`

Computes the scalar reward. Unlike RoverEnv, this variant accepts the full observation dict so it can read the pedestrian heatmap channel (observation['fused_image'][:, :, 1]). Key constants:

Constant	Value
`success_distance`	0.5 m
`goal_reward`	100
`distance_delta_scale`	0.9
`heatmap_center_scale`	0.1
`stuck_penalty`	−25.0
`too_far_away_penilty`	−10.0

The heatmap penalty subtracts heatmap_center_scale × sum(center_4_columns_of_heatmap) from the reward each step, discouraging the rover from moving towards detected pedestrians.

`is_robot_flipped() -> Union[str, bool]`

Returns 'roll_left', 'roll_right', 'pitch_forward', or 'pitch_backward' when either absolute angle exceeds flip_threshold = 1.48 rad (~85°). Returns False otherwise.

`heading_controller(desired_heading, current_heading) -> float`

PID controller for yaw tracking. Identical gains to RoverEnv:

Parameter	Value
`Kp`	2.0
`Ki`	0.0
`Kd`	0.1
Max output	±7.0 rad/s

`too_far_away() -> bool`

Returns True when the rover exits the world-specific bounding box. Triggers episode termination with too_far_away_penilty = -10.0.

`close()`

env.close()

Calls self.rl_obs_shm.close() to detach from the shared memory segment (does not unlink it — the inference pipeline owns the segment lifetime).
Destroys the ROS2 node and calls rclpy.shutdown().

`pose_array_callback(msg)`

Subscriber callback for /rover/pose_array. Extracts the first pose, stores position in self.rover_position, and extracts roll/pitch/yaw from the quaternion via transforms3d.euler.quat2euler. This callback also serves as the orientation source since the IMU subscriber is disabled.

`odom_callback(msg)`

Subscriber callback for /odometry/wheels (nav_msgs/Odometry). Stores msg.twist.twist.linear.x and msg.twist.twist.angular.z in self.current_linear_velocity and self.current_angular_velocity.

Robot Connection Check

_check_robot_connection(timeout) is defined but disabled (commented out) in RoverEnvFused.__init__. When re-enabled it polls lidar_data for non-zero values and returns False after the timeout elapses without detecting sensor activity.

Stuck Detection

RoverEnvFused uses the same position_history mechanism as RoverEnv but with different thresholds:

Parameter	Value
`stuck_window`	5000 steps
`stuck_threshold`	0.0001 m
`stuck_penalty`	−25.0

Once position_history reaches 5000 entries, step() checks whether the displacement between the oldest and newest recorded position is less than 0.0001 m. If so, the episode terminates with the stuck penalty.

Usage with Active Vision Pipeline

The following snippet shows the recommended pattern for launching RoverEnvFused alongside DreamerV3 or a Stable-Baselines3 policy:

from environments.leo_rover_env_fused import RoverEnvFused
from stable_baselines3 import PPO

# 1. Start the Active Vision inference pipeline in a separate process FIRST.
#    It must create the shared memory segment named 'rl_observation'
#    and begin writing frames before the environment is instantiated.

# 2. Instantiate the environment — will attach to the shared memory segment.
env = RoverEnvFused(
    size=(96, 96),
    length=6000,
    world_n='moon',
    rl_obs_name='rl_observation',  # must match the pipeline's segment name
)

# 3. Train a policy that consumes the Dict observation.
model = PPO(
    policy='MultiInputPolicy',
    env=env,
    verbose=1,
)
model.learn(total_timesteps=1_000_000)

# 4. Clean up — detaches shared memory and shuts down ROS2.
env.close()

When using DreamerV3, configure the encoder to consume the Dict space directly. The fused_image key should be routed to the convolutional encoder; the scalar keys (pose, imu, target, velocities) should be processed by an MLP encoder.

Environment API

ROS 2 Nodes & Topics

Configuration

RoverEnvFused Vision-Fused Gymnasium Environment API

Constructor

Observation Space

Fused Image Channels

Auxiliary Observation Keys

Shared Memory Interface

Memory Layout

Read Protocol

Attaching at Construction

Action Space

Methods

`step(action) -> tuple`

`reset(seed=None, options=None) -> tuple`

`get_observation() -> dict`

`get_target_info() -> np.ndarray`

`task_reward(observation) -> float`

`is_robot_flipped() -> Union[str, bool]`

`heading_controller(desired_heading, current_heading) -> float`

`too_far_away() -> bool`

`close()`

`pose_array_callback(msg)`

`odom_callback(msg)`

Robot Connection Check

Stuck Detection

Usage with Active Vision Pipeline

Build docs developers (and LLMs) love

Environment API

ROS 2 Nodes & Topics

Configuration

Documentation Index

​Constructor

​Observation Space

​Fused Image Channels

​Auxiliary Observation Keys

​Shared Memory Interface

​Memory Layout

​Read Protocol

​Attaching at Construction

​Action Space

​Methods

​step(action) -> tuple

​reset(seed=None, options=None) -> tuple

​get_observation() -> dict

​get_target_info() -> np.ndarray

​task_reward(observation) -> float

​is_robot_flipped() -> Union[str, bool]

​heading_controller(desired_heading, current_heading) -> float

​too_far_away() -> bool

​close()

​pose_array_callback(msg)

​odom_callback(msg)

​Robot Connection Check

​Stuck Detection

​Usage with Active Vision Pipeline

Build docs developers (and LLMs) love

Constructor

Observation Space

Fused Image Channels

Auxiliary Observation Keys

Shared Memory Interface

Memory Layout

Read Protocol

Attaching at Construction

Action Space

Methods

`step(action) -> tuple`

`reset(seed=None, options=None) -> tuple`

`get_observation() -> dict`

`get_target_info() -> np.ndarray`

`task_reward(observation) -> float`

`is_robot_flipped() -> Union[str, bool]`

`heading_controller(desired_heading, current_heading) -> float`

`too_far_away() -> bool`

`close()`

`pose_array_callback(msg)`

`odom_callback(msg)`

Robot Connection Check

Stuck Detection

Usage with Active Vision Pipeline