Trajectory Prediction

Alpamayo 1 predicts future vehicle trajectories by combining an action space representation with a diffusion-based decoder. This approach enables probabilistic trajectory generation that captures the multimodal nature of autonomous driving scenarios.

Overview

Trajectory prediction in Alpamayo 1 follows a three-stage process:

Action parameterization: Represent trajectories as sequences of acceleration and curvature controls
Diffusion sampling: Generate action sequences using flow matching diffusion
Kinematic integration: Convert actions to XYZ waypoints using a unicycle motion model

Action Space Representation

Alpamayo 1 uses a Unicycle Kinematic Model with acceleration and curvature as control inputs, implemented in UnicycleAccelCurvatureActionSpace.

Action Dimensions

# From alpamayo_r1/action_space/unicycle_accel_curvature.py:98-100
def get_action_space_dims(self) -> tuple[int, int]:
    """Get the dimensions of the action space."""
    return (self.n_waypoints, 2)

Parameters:

Number of waypoints: 64
Controls per waypoint: 2 (acceleration, curvature)
Total action dimensions: (64, 2)
Temporal resolution: 10 Hz (dt = 0.1 seconds)
Prediction horizon: 6.4 seconds

Control Variables

Each waypoint is parameterized by two control inputs:

Control	Description	Units	Bounds	Normalization
Acceleration (`a`)	Longitudinal acceleration	m/s²	[-9.8, 9.8]	Mean=0.0, Std=1.0
Curvature (`κ`)	Path curvature (1/radius)	m⁻¹	[-0.2, 0.2]	Mean=0.0, Std=1.0

Curvature represents how sharply the vehicle turns. A curvature of 0.2 m⁻¹ corresponds to a turning radius of 5 meters (tight turn), while 0.01 m⁻¹ corresponds to 100 meters (gentle curve).

# From alpamayo_r1/action_space/unicycle_accel_curvature.py:39-96
class UnicycleAccelCurvatureActionSpace(ActionSpace):
    def __init__(
        self,
        accel_mean: float = 0.0,
        accel_std: float = 1.0,
        curvature_mean: float = 0.0,
        curvature_std: float = 1.0,
        accel_bounds: tuple[float, float] = (-9.8, 9.8),
        curvature_bounds: tuple[float, float] = (-0.2, 0.2),
        dt: float = 0.1,
        n_waypoints: int = 64,
        # Smoothness regularization parameters
        theta_lambda: float = 1e-6,
        v_lambda: float = 1e-6,
        a_lambda: float = 1e-4,
        kappa_lambda: float = 1e-4,
        # ...
    ):

Kinematic Motion Model

The unicycle model converts control sequences into spatial trajectories through forward integration.

State Variables

At each timestep t, the vehicle state consists of:

Position: (x_t, y_t) in meters
Heading: θ_t in radians
Velocity: v_t in m/s

Integration Equations

# From alpamayo_r1/action_space/unicycle_accel_curvature.py:331-380
# Velocity integration
velocity[t+1] = velocity[t] + acceleration[t] * dt

# Heading integration
theta[t+1] = theta[t] + curvature[t] * velocity[t] * dt + \
                        curvature[t] * acceleration[t] * (dt^2 / 2)

# Position integration (trapezoidal rule)
x[t+1] = x[t] + (velocity[t] * cos(theta[t]) + 
                  velocity[t+1] * cos(theta[t+1])) * dt / 2
                  
y[t+1] = y[t] + (velocity[t] * sin(theta[t]) + 
                  velocity[t+1] * sin(theta[t+1])) * dt / 2

Initial State Estimation

The model estimates the initial velocity v_0 from egomotion history:

# From alpamayo_r1/action_space/unicycle_accel_curvature.py:208-222
def estimate_t0_states(
    self, traj_history_xyz: torch.Tensor, traj_history_rot: torch.Tensor
) -> dict[str, torch.Tensor]:
    """Estimate the t0 states from the trajectory history."""
    full_xy = traj_history_xyz[..., :2]  # (..., N_hist, 2)
    dxy = full_xy[..., 1:, :] - full_xy[..., :-1, :]  # (..., N_hist-1, 2)
    theta = so3_to_yaw_torch(traj_history_rot)
    theta = unwrap_angle(theta)
    
    v = dxy_theta_to_v_without_v0(
        dxy=dxy, theta=theta, dt=self.dt, 
        v_lambda=self.v_lambda, v_ridge=self.v_ridge
    )  # (..., N+1)
    v_t0 = v[..., -1]
    return {"v": v_t0}

Initial velocity estimation uses Tikhonov regularization to ensure smooth velocity profiles consistent with the observed position changes.

Diffusion-Based Generation

Alpamayo 1 uses Flow Matching, a continuous normalizing flow technique, to generate action sequences.

Flow Matching Basics

Flow matching learns a velocity field v(x, t) that transforms:

Source distribution (t=0): Standard Gaussian noise x ~ N(0, I)
Target distribution (t=1): Valid driving action sequences

# From alpamayo_r1/diffusion/flow_matching.py:22-47
class FlowMatching(BaseDiffusion):
    """Flow Matching model.
    
    References:
    Flow Matching for Generative Modeling
        https://arxiv.org/pdf/2210.02747
    Guided Flows for Generative Modeling and Decision Making
        https://arxiv.org/pdf/2311.13443
    """
    
    def __init__(
        self,
        int_method: Literal["euler"] = "euler",
        num_inference_steps: int = 10,
        *args,
        **kwargs,
    ):

Sampling Process

The diffusion decoder performs iterative denoising:

# From alpamayo_r1/diffusion/flow_matching.py:111-127
# Initialize with random noise
x = torch.randn(batch_size, *self.x_dims, device=device)
time_steps = torch.linspace(0.0, 1.0, inference_step + 1, device=device)

# Euler integration over timesteps
for i in range(inference_step):
    dt = time_steps[i + 1] - time_steps[i]
    t_start = time_steps[i]
    
    # Predict velocity field
    v = step_fn(x=x, t=t_start)
    
    # Update action
    x = x + dt * v

Default parameters:

Integration method: Euler (first-order ODE solver)
Number of steps: 10
Timestep schedule: Linear from 0 to 1

Expert Model as Denoiser

The expert model predicts the velocity field at each diffusion timestep:

# From alpamayo_r1/models/alpamayo_r1.py:254-284
def step_fn(x: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
    # x: (B*, 64, 2) - current noisy action
    # t: (B*, 1, 1) - diffusion timestep [0, 1]
    
    b_star = x.shape[0]
    n_diffusion_tokens = self.action_space.get_action_space_dims()[0]  # 64
    
    # Project noisy action to expert token embeddings
    future_token_embeds = self.action_in_proj(x, t)
    # Shape: (b*, 64, hidden_size)
    
    # Run expert with cached CoC context
    expert_out_base = self.expert(
        inputs_embeds=future_token_embeds,
        position_ids=position_ids,
        past_key_values=prompt_cache,  # Cached VLM outputs
        attention_mask=attention_mask,
        use_cache=True,
        **forward_kwargs,
    )
    
    # Project to velocity field
    last_hidden = expert_out_base.last_hidden_state[:, -n_diffusion_tokens:]
    pred = self.action_out_proj(last_hidden).view(
        -1, *self.action_space.get_action_space_dims()
    )  # (b*, 64, 2) - velocity field
    
    return pred

Key components:

Action input projection (action_in_proj): Maps (action, timestep) → token embeddings
Expert transformer: Processes embeddings with CoC reasoning context
Action output projection (action_out_proj): Maps hidden states → velocity field

The expert model uses non-causal attention by default (expert_non_causal_attention=True), allowing each waypoint to attend to all other waypoints for better trajectory coherence.

Multi-Sample Generation

Alpamayo 1 supports generating multiple trajectory samples per input:

# From alpamayo_r1/test_inference.py:56-63
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=6,  # Generate 6 different trajectories
    num_traj_sets=1,     # Number of independent sample sets
    # ...
)

# Output shapes:
# pred_xyz: [batch_size, num_traj_sets, num_traj_samples, 64, 3]
# pred_rot: [batch_size, num_traj_sets, num_traj_samples, 64, 3, 3]

Multi-sample benefits:

Multimodality: Capture different plausible futures (e.g., turn left vs. right)
Uncertainty quantification: Spread of samples indicates prediction confidence
Best-of-N selection: Choose trajectory with minimum error or highest safety score

# From alpamayo_r1/models/alpamayo_r1.py:150-151
n_samples_total = num_traj_samples * num_traj_sets
total_batch = B * n_samples_total

Each sample also gets its own Chain-of-Causation trace due to stochastic VLM generation, providing diverse reasoning explanations for different trajectory modes.

Output Format

Trajectory Representation

Final trajectories are represented in the ego vehicle frame:

# Output tensors
pred_xyz: torch.Tensor  # Shape: (B, num_sets, num_samples, 64, 3)
pred_rot: torch.Tensor  # Shape: (B, num_sets, num_samples, 64, 3, 3)

Coordinate system:

Origin: Current ego vehicle position (last history waypoint)
X-axis: Forward direction
Y-axis: Left direction
Z-axis: Up direction (typically constant for ground vehicles)
Rotations: SO(3) rotation matrices (not quaternions)

Temporal Resolution

Property	Value
Waypoints	64
Frequency	10 Hz
Time interval	0.1 seconds
Horizon	6.4 seconds
Waypoint times	[0.1, 0.2, 0.3, …, 6.4] seconds

The first waypoint (t=0.1s) represents the predicted position 100ms in the future, not the current position.

Trajectory Smoothness & Regularization

The action space enforces trajectory smoothness through Tikhonov regularization:

# From alpamayo_r1/action_space/unicycle_accel_curvature.py:49-96
# Smoothness parameters
theta_lambda: float = 1e-6   # Heading smoothness
theta_ridge: float = 1e-8    # Ridge regularization for theta
v_lambda: float = 1e-6       # Velocity smoothness
v_ridge: float = 1e-4        # Ridge regularization for velocity
a_lambda: float = 1e-4       # Acceleration smoothness
a_ridge: float = 1e-4        # Ridge regularization for acceleration
kappa_lambda: float = 1e-4   # Curvature smoothness
kappa_ridge: float = 1e-4    # Ridge regularization for curvature

Regularization effects:

1st-order smoothness: Penalizes rapid changes in control values
2nd-order smoothness: Penalizes changes in the rate of change (jerk)
Ridge regularization: Prevents numerical instabilities

This ensures physically plausible trajectories that avoid jerky or discontinuous motion.

Evaluation Metrics

The standard metric for trajectory prediction is minimum Average Displacement Error (minADE):

# From alpamayo_r1/test_inference.py:68-72
gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)

# Compute L2 distance for each sample, averaged over waypoints
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()  # Minimum over all samples

minADE definition:

minADE = min_{i ∈ samples} [ (1/T) Σ_{t=1}^{T} ||pred_i(t) - gt(t)||_2 ]

Where:

T = 64 waypoints
pred_i(t) = predicted XY position at time t for sample i
gt(t) = ground truth XY position at time t

Nondeterminism: Due to stochastic sampling, diffusion inference, and hardware differences, minADE values will vary between runs even with the same random seed.

Customizing Trajectory Generation

You can control trajectory generation through diffusion parameters:

pred_xyz, pred_rot = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=10,       # More samples = better coverage
    num_traj_sets=1,
    
    # Diffusion-specific kwargs
    diffusion_kwargs={
        'inference_step': 20,  # More steps = higher quality (slower)
        'int_method': 'euler', # Integration method
    },
)

Trade-offs:

More samples: Better multimodal coverage, higher compute cost
More diffusion steps: Smoother trajectories, slower inference
Fewer diffusion steps: Faster inference, potentially lower quality

Limitations & Future Directions

Current Limitations

The v1.0 release does not include:

Route conditioning: Trajectories are not guided by navigation goals
Multi-agent interaction modeling: No explicit modeling of other vehicles’ reactions
Uncertainty calibration: Predicted distributions may not be well-calibrated

From README (lines 94-96):

| Route/navigation conditioning | Explicit navigation or route inputs | ❌ Not in this release |

Research Opportunities

Guided sampling: Incorporate safety constraints or navigation goals into diffusion
Uncertainty quantification: Calibrate predicted distributions for reliable confidence estimates
Multi-agent forecasting: Extend to predict joint futures of multiple vehicles
Longer horizons: Scale to 10+ second predictions for highway scenarios

Architecture

Understand the full model architecture

Inputs & Outputs

Detailed I/O specifications

Get Started

Core Concepts

Guides

Model Components

Trajectory Prediction

Overview

Action Space Representation

Action Dimensions

Control Variables

Kinematic Motion Model

State Variables

Integration Equations

Initial State Estimation

Diffusion-Based Generation

Flow Matching Basics

Sampling Process

Expert Model as Denoiser

Multi-Sample Generation

Output Format

Trajectory Representation

Temporal Resolution

Trajectory Smoothness & Regularization

Evaluation Metrics

Customizing Trajectory Generation

Limitations & Future Directions

Current Limitations

Research Opportunities

Architecture

Inputs & Outputs

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Components

Documentation Index

​Overview

​Action Space Representation

​Action Dimensions

​Control Variables

​Kinematic Motion Model

​State Variables

​Integration Equations

​Initial State Estimation

​Diffusion-Based Generation

​Flow Matching Basics

​Sampling Process

​Expert Model as Denoiser

​Multi-Sample Generation

​Output Format

​Trajectory Representation

​Temporal Resolution

​Trajectory Smoothness & Regularization

​Evaluation Metrics

​Customizing Trajectory Generation

​Limitations & Future Directions

​Current Limitations

​Research Opportunities

Architecture

Inputs & Outputs

Build docs developers (and LLMs) love

Overview

Action Space Representation

Action Dimensions

Control Variables

Kinematic Motion Model

State Variables

Integration Equations

Initial State Estimation

Diffusion-Based Generation

Flow Matching Basics

Sampling Process

Expert Model as Denoiser

Multi-Sample Generation

Output Format

Trajectory Representation

Temporal Resolution

Trajectory Smoothness & Regularization

Evaluation Metrics

Customizing Trajectory Generation

Limitations & Future Directions

Current Limitations

Research Opportunities