How Alpamayo 1 generates future trajectories using diffusion-based action prediction
Alpamayo 1 predicts future vehicle trajectories by combining an action space representation with a diffusion-based decoder. This approach enables probabilistic trajectory generation that captures the multimodal nature of autonomous driving scenarios.
# From alpamayo_r1/action_space/unicycle_accel_curvature.py:98-100def get_action_space_dims(self) -> tuple[int, int]: """Get the dimensions of the action space.""" return (self.n_waypoints, 2)
Parameters:
Number of waypoints: 64
Controls per waypoint: 2 (acceleration, curvature)
Each waypoint is parameterized by two control inputs:
Control
Description
Units
Bounds
Normalization
Acceleration (a)
Longitudinal acceleration
m/s²
[-9.8, 9.8]
Mean=0.0, Std=1.0
Curvature (κ)
Path curvature (1/radius)
m⁻¹
[-0.2, 0.2]
Mean=0.0, Std=1.0
Curvature represents how sharply the vehicle turns. A curvature of 0.2 m⁻¹ corresponds to a turning radius of 5 meters (tight turn), while 0.01 m⁻¹ corresponds to 100 meters (gentle curve).
The diffusion decoder performs iterative denoising:
# From alpamayo_r1/diffusion/flow_matching.py:111-127# Initialize with random noisex = torch.randn(batch_size, *self.x_dims, device=device)time_steps = torch.linspace(0.0, 1.0, inference_step + 1, device=device)# Euler integration over timestepsfor i in range(inference_step): dt = time_steps[i + 1] - time_steps[i] t_start = time_steps[i] # Predict velocity field v = step_fn(x=x, t=t_start) # Update action x = x + dt * v
Expert transformer: Processes embeddings with CoC reasoning context
Action output projection (action_out_proj): Maps hidden states → velocity field
The expert model uses non-causal attention by default (expert_non_causal_attention=True), allowing each waypoint to attend to all other waypoints for better trajectory coherence.
Alpamayo 1 supports generating multiple trajectory samples per input:
# From alpamayo_r1/test_inference.py:56-63pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout( data=model_inputs, num_traj_samples=6, # Generate 6 different trajectories num_traj_sets=1, # Number of independent sample sets # ...)# Output shapes:# pred_xyz: [batch_size, num_traj_sets, num_traj_samples, 64, 3]# pred_rot: [batch_size, num_traj_sets, num_traj_samples, 64, 3, 3]
Multi-sample benefits:
Multimodality: Capture different plausible futures (e.g., turn left vs. right)
Uncertainty quantification: Spread of samples indicates prediction confidence
Best-of-N selection: Choose trajectory with minimum error or highest safety score
# From alpamayo_r1/models/alpamayo_r1.py:150-151n_samples_total = num_traj_samples * num_traj_setstotal_batch = B * n_samples_total
Each sample also gets its own Chain-of-Causation trace due to stochastic VLM generation, providing diverse reasoning explanations for different trajectory modes.
pred_i(t) = predicted XY position at time t for sample i
gt(t) = ground truth XY position at time t
Nondeterminism: Due to stochastic sampling, diffusion inference, and hardware differences, minADE values will vary between runs even with the same random seed.