Evaluation Modes
AlpaSim supports two evaluation modes:- In-Runtime Evaluation (Default)
- Separate Job Evaluation
By default, evaluation runs within the runtime after each rollout completes. This provides immediate feedback and is suitable for most use cases.
No additional configuration needed - this is the default mode.
Metrics Computation
AlpaSim computes multiple categories of metrics to evaluate driving performance:Safety Metrics
Binary metrics indicating pass (0) or fail (1):Collision Metrics
Collision Metrics
collision_at_fault: Driver caused a collision (front/lateral impact)collision_rear: Rear-end collision (not at fault)collision_front: Front collision detectioncollision_lateral: Side collision detectioncollision_any: Any collision occurredThese metrics are computed by analyzing vehicle trajectories and detecting overlaps between the ego vehicle and other agents.
Road Compliance
Road Compliance
offroad: Vehicle drove off the designated road surfaceoffroad_or_collision_at_fault: Combined metric for any critical safety violationComputed using the vehicle polygon and road geometry from the map data.
Performance Metrics
Continuous metrics measuring driving quality:Trajectory Deviation
Trajectory Deviation
dist_to_gt_trajectory: Maximum distance from ground truth path (meters)
- Lower is better
- Indicates how closely the driver follows expected routes
- Aggregated using MAX over time (worst deviation during the drive)
Progress Metrics
Progress Metrics
progress: Absolute distance traveled along the routeprogress_rel: Relative progress compared to ground truthduration_frac_20s: Fraction of 20s drive completed before any failure
- 1.0 = completed full 20s without issues
- Less than 1.0 = failed early (collision, off-road, or excessive deviation)
Plan Quality (MinADE)
Plan Quality (MinADE)
Minimum Average Displacement Error at various time horizons:Measures how accurately the predicted trajectory matches the actual trajectory at different prediction horizons.
Plan Deviation
Plan Deviation
Measures deviation from planned trajectory:Tracks how well the vehicle follows its own planned path.
Distance Between Incidents
avg_dist_between_incidents: Average kilometers traveled per incident (collision or offroad)- Higher is better
- Measures safety over distance
- Excludes rear-end collisions not caused by the driver
Safety Monitor
safety_monitor_triggered: Indicates if safety interventions were requiredVideo Generation
AlpaSim generates evaluation videos with multiple layout options:Video Layouts
- DEFAULT Layout
- REASONING_OVERLAY Layout
- Both Layouts
The default layout provides a comprehensive debug view with three panels:Components:Map Elements:
- BEV (Bird’s Eye View) map: Top-down view showing:
- Road lanes and edges
- Ego vehicle position
- Traffic agents
- Planned trajectories
- Ground truth ghost vehicle
- Camera view: Front camera feed with optional trajectory overlays
- Metrics table: Real-time metric values
Video Configuration Options
Performance Analysis
AlpaSim automatically generates performance metrics and visualizations:Metrics Plot
After each simulation, a comprehensive performance visualization is generated at{log_dir}/metrics/metrics_plot.png.
Metrics Plot Components
Metrics Plot Components
3x3 Grid Layout:Row 1: RPC Performance
- RPC Duration histogram: Total time from call start to coroutine resumption
- RPC Blocking histogram: Event loop scheduler delay
- RPC Queue Depth histogram: Service saturation levels
- Rollout Duration histogram: Total time per rollout
- Step Duration histogram: Time per simulation step
- Service Configuration table: Replica counts and capacity
- CPU Utilization boxplots: Per-service CPU usage
- GPU Utilization boxplots: GPU compute usage
- GPU Memory boxplots: Memory usage with capacity line
- Async worker idle percentage: Runtime idle time
- Sim seconds per rollout: Wallclock time per simulation
Performance Metrics File
Raw performance data is stored in{log_dir}/metrics/metrics.prom in Prometheus format.