Active Vision for Social Robot Navigation with DreamerV3
The AVSN pipeline combines DreamerV3, spatiotemporal attention, and fisheye camera perception for joint locomotion and gaze control in socially-aware off-road navigation.
Use this file to discover all available pages before exploring further.
Active Vision for Social Navigation (AVSN) is a joint reinforcement learning framework that teaches a rover where to look and where to drive within a single end-to-end policy. Instead of training separate perception and control modules, AVSN uses a DreamerV3 latent world model to solve the credit-assignment problem inherent in active sensing: the agent must discover that choosing a particular camera gaze direction now will produce observations that make future navigation decisions easier. The policy outputs three values simultaneously — linear velocity, angular velocity, and a discrete pan-window index — so locomotion and gaze are co-optimised from the start.Paper:Active Vision for Social Navigation (Vice & Sukthankar, 2025)
The pipeline connects five components across three repositories. Data flows from the fisheye camera through perception and into the RL agent, with the gaze action feeding back to select the next camera window.
NVIDIA GPU with CUDA — required for YOLO inference, monocular depth estimation, and JAX-based DreamerV3 training
Python
Python 3.10+ in two separate virtual environments: one for the attention pipeline, one for DreamerV3
ROS 2 Workspace
Built with colcon build from ~/src/RoboTerrain/ros2_ws/. Source both setup.bash files in every ROS terminal.
The attention model and DreamerV3 have conflicting JAX version requirements and must run in separate Python environments. Each repository ships a requirements.txt. Create and activate the appropriate environment before running each component.
Launch the Leo Rover with fisheye camera in one of the supported worlds (inspect, construction, island):
# With GUI (development / debugging)ros2 launch roverrobotics_gazebo Leo_rover_fisheye.launch.py# Headless (recommended for training runs)ros2 launch roverrobotics_gazebo Leo_rover_fisheye.launch.py headless:=true
Wait until Gazebo reports the world is running and the rover model is loaded before proceeding.
2
Terminal 2 — Fisheye Camera Bridge
Subscribes to the fisheye camera ROS topic, rectifies six 320×320 windows covering ~160° horizontal FOV, and writes them to a shared memory block:
cd ~/src/attention/inferencepython fisheye_ros2_mem_share.py
Verify the node is receiving frames — it will print window dimensions on startup.
3
Terminal 3 — Attention + Perception Pipeline
Activate your attention Python environment, then run the fusion pipeline. This process runs YOLO pedestrian detection, the spatiotemporal attention model, and monocular depth estimation, then writes the fused 3-channel observation to shared memory for DreamerV3:
The --attention_mode argument points to the trained attention checkpoint (.pkl). The script reads fisheye frames from shared memory and writes fused observations back to a separate shared memory block named rl_observation.
4
Terminal 4 — DreamerV3 RL Agent
DreamerV3 requires the LD_LIBRARY_PATH to be filtered to ROS-only paths before launch. Without this filter, Python-environment-installed libstdc++ and other system libraries clash with ROS 2 shared libraries, causing segmentation faults or silent import failures.
Activate your DreamerV3 Python environment, filter the library path, then launch the agent:
The leorover config in dreamerv3/configs.yaml defines the observation shape (96×96×3), action heads (continuous locomotion + discrete gaze), and world-model hyperparameters.
5
Terminal 5 — Dynamic Human Actors
Spawn pedestrian actors with predefined trajectory files. The examples below are for the inspect world:
The spatiotemporal attention model (trajectory_model.py) is trained separately on datasets of RGB fisheye frames with YOLO-generated pedestrian occupancy masks. Produced checkpoints are then consumed by inference.py at runtime.
Confirm Gazebo is running and fisheye_ros2_mem_share.py is receiving on the camera topic. Check the ROS topic list:
ros2 topic list | grep cameraros2 topic hz /camera/image_raw
If no camera topic appears, the Leo Rover fisheye launch may have failed. Re-run Terminal 1 and watch for launch errors.
YOLO ONNX errors in inference.py
Ensure onnxruntime-gpu is installed in the attention Python environment (not the DreamerV3 environment):
pip install onnxruntime-gpu
Also verify the YOLO .onnx file path passed to --yolo_model_path exists and is a valid ONNX model.
DreamerV3 CUDA out-of-memory (OOM)
Set XLA_PYTHON_CLIENT_PREALLOCATE=false to prevent JAX from pre-allocating the entire GPU memory pool. This flag is already included in the Terminal 4 launch command above. If OOM persists, reduce the world-model batch size in dreamerv3/configs.yaml under the leorover config.
libstdc++ version conflicts when starting DreamerV3
The LD_LIBRARY_PATH filter in Terminal 4 resolves most Python-environment vs. ROS 2 system library clashes. If you still see GLIBCXX version errors, ensure the filter command ran successfully by printing $FILTERED_LD_LIBRARY_PATH before launching:
echo $FILTERED_LD_LIBRARY_PATH# Should only show paths starting with /opt/ros/
Spawned actors not visible in Gazebo
Verify the --world_name flag passed to spawn.py exactly matches the name of the loaded Gazebo world. The world name is case-sensitive. You can confirm the active world name with:
Attention checkpoint fails to load in inference.py
Attention checkpoints are Flax .pkl files. The model architecture at load time must match the architecture used during training. Ensure the --embedding_dim and --num_heads values passed to inference.py match those used when the checkpoint was produced by run_efficient_training.py.