Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jackvice/RoboTerrain/llms.txt
Use this file to discover all available pages before exploring further.
sb3_SAC.py is the single entry point for both training and inference of rover navigation agents. In training mode it creates a Soft Actor-Critic (SAC) model with automatic entropy tuning, wraps the Gymnasium environment in a VecNormalize normalizer for stable learning, and saves periodic checkpoints alongside their matching normalization statistics. In predict mode the same script loads a saved checkpoint and normalization file, disables further updates, and runs the policy deterministically. All major options are controlled through CLI arguments described below.
Before launching
sb3_SAC.py, the position bridge node (ign_ros2_Nav2_topics.py) must already be running so that the /rover/pose_array topic is available. See Inference → Running the Position Bridge for the exact command.CLI Arguments
All arguments are parsed byargparse in parse_args().
| Argument | Choices / Type | Default | Required | Description |
|---|---|---|---|---|
--mode | train | predict | train | Yes | Operating mode |
--load | True | False | — | Yes | Whether to load an existing checkpoint |
--world | inspect | maze | island | rubicon | inspect | No | Which Gazebo world to use |
--vision | True | False | False | No | Use fused camera observation instead of LIDAR |
--checkpoint_name | str (file path) | — | When --load True | Path to the .zip checkpoint file |
--normalize_stats | str (file path) | — | When --load True | Path to the matching _normalize.pkl file |
--checkpoint_name and --normalize_stats are validated at runtime: if --load True is set and either path is missing the script raises a ValueError before creating any ROS nodes.Training from Scratch
The minimal command to start a fresh training run in the inspection world:- Create a timestamped
DummyVecEnvwrapping aMonitor-wrapped environment - Initialise a fresh
VecNormalizewrapper - Build the SAC model with the hyperparameters below
- Call
model.learn(total_timesteps=8_000_000, ...)
SAC Hyperparameters
These values are hard-coded insb3_SAC.py when creating a new model (--load False):
| Hyperparameter | Value | Notes |
|---|---|---|
learning_rate | 3e-4 | Adam optimizer LR for actor, critic, and entropy |
buffer_size | 300,000 | Experience replay buffer capacity |
learning_starts | 50,000 | Steps of random exploration before gradient updates begin |
batch_size | 512 | Mini-batch size for each gradient step |
train_freq | 512 | Collect this many new steps before triggering an update |
gradient_steps | 6 | Number of gradient updates per train_freq cycle |
ent_coef | "auto_0.5" | SAC entropy coefficient; auto-tuned starting from α = 0.5 |
total_timesteps | 8,000,000 | Total environment steps for a full training run |
device | "cuda" | PyTorch device |
Checkpointing
Checkpoints are saved by the customSaveVecNormalizeCallback, which extends SB3’s CheckpointCallback:
./checkpoints/ (created automatically).
File naming convention:
YYYYMMDD_HHMM) embedded in the prefix allows multiple training runs in the same directory without collision.
Observation Normalization
The environment is wrapped inVecNormalize for online normalisation of observations and rewards:
| Parameter | Value | Purpose |
|---|---|---|
clip_obs | 20.0 | Prevents LIDAR or pose outliers from dominating gradients |
clip_reward | 100.0 | Aligns with the goal_reward = 100 scale |
gamma | 0.99 | Discount factor used for reward normalisation |
env.training = False and env.norm_reward = False.
TensorBoard Monitoring
Training metrics are logged to a timestamped subdirectory undertboard_logs/:
./tboard_logs/SAC_{world}_{timestamp}/. Key scalars include:
train/actor_loss,train/critic_loss,train/ent_coefrollout/ep_rew_mean,rollout/ep_len_meantime/total_timesteps,time/fps
Resuming Training
To continue training from a saved checkpoint:--load True the script:
- Loads the
VecNormalizestatistics from--normalize_stats(preserving the running mean/variance) - Loads the SAC model weights and replay buffer reference from
--checkpoint_name - Resets the entropy coefficient to
0.05for fine-tuning - Calls
model.learn(..., reset_num_timesteps=False)so step counts continue from the checkpoint
Always pair a
.zip model checkpoint with its matching _normalize.pkl file. Loading mismatched normalization statistics will cause the observation distribution seen by the policy to differ from what it was trained on, leading to degraded performance.