Running Inference

This guide walks you through running inference with Alpamayo 1 to generate trajectory predictions and Chain-of-Causation reasoning traces.

Prerequisites

Before running inference, ensure you have:

Completed the installation process
At least 24 GB GPU VRAM (e.g., RTX 3090, RTX 4090, A5000, H100)
Authenticated with HuggingFace and gained access to the model and dataset

GPUs with less than 24 GB VRAM will likely encounter CUDA out-of-memory errors during inference.

Quick Start with Test Script

The fastest way to test inference is using the provided test script:

Navigate to the repository

cd alpamayo

Run the test inference script

python src/alpamayo_r1/test_inference.py

This script will download example data (relatively small) and model weights (22 GB). Download time depends on your network bandwidth - approximately 2.5 minutes on a 100 MB/s connection.

Review the output

The script will output:

Chain-of-Causation reasoning traces
Minimum Average Displacement Error (minADE) in meters
A note about output variance due to trajectory sampling

Understanding the Inference Pipeline

The inference pipeline consists of several key steps:

1. Load the Dataset

Load a clip from the PhysicalAI-AV dataset:

from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

# Example clip ID
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print(f"Loading dataset for clip_id: {clip_id}...")
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)
print("Dataset loaded.")

The load_physical_aiavdataset function returns a dictionary containing:

image_frames: Multi-camera video frames (N_cameras, num_frames, 3, H, W)
ego_history_xyz: Historical ego trajectory positions
ego_history_rot: Historical ego trajectory rotations
ego_future_xyz: Ground truth future trajectory positions
ego_future_rot: Ground truth future trajectory rotations

2. Create Message Format

Construct the input message using the image frames:

from alpamayo_r1 import helper

messages = helper.create_message(data["image_frames"].flatten(0, 1))

The message format includes:

System prompt: “You are a driving assistant that generates safe and accurate actions.”
User input: Multi-camera images and trajectory history
Assistant response prompt: Starts with Chain-of-Causation reasoning

3. Load Model and Processor

Load the Alpamayo R1 model and create the data processor:

import torch
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1

model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B", 
    dtype=torch.bfloat16
).to("cuda")

processor = helper.get_processor(model.tokenizer)

The model uses torch.bfloat16 precision for efficient inference on modern GPUs.

4. Prepare Model Inputs

Tokenize and prepare the inputs for the model:

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)

model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}

model_inputs = helper.to_device(model_inputs, "cuda")

5. Run Inference

Generate trajectory predictions with Chain-of-Causation reasoning:

torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,  # Increase for more trajectories
        max_generation_length=256,
        return_extra=True,
    )

# View Chain-of-Causation reasoning
print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

Inference Parameters Explained

top_p: Nucleus sampling parameter (0.98 for diverse outputs)
temperature: Sampling temperature (0.6 for balanced randomness)
num_traj_samples: Number of trajectory samples to generate
max_generation_length: Maximum tokens for reasoning generation
return_extra: Return additional outputs like CoC traces

6. Evaluate Results

Compute the minimum Average Displacement Error (minADE):

import numpy as np

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()

print("minADE:", min_ade, "meters")

Generating Multiple Trajectories

To generate multiple trajectory samples and reasoning traces, increase the num_traj_samples parameter:

pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    top_p=0.98,
    temperature=0.6,
    num_traj_samples=5,  # Generate 5 different trajectories
    max_generation_length=256,
    return_extra=True,
)

# Each trajectory has its own CoC reasoning
for i, cot in enumerate(extra["cot"][0]):
    print(f"\nTrajectory {i+1} reasoning:\n", cot)

Increasing num_traj_samples requires more GPU memory. If you encounter OOM errors, reduce this value or see the troubleshooting guide.

Output Format

The model returns three outputs:

Predicted Trajectories (`pred_xyz`, `pred_rot`)

pred_xyz: Shape [batch_size, num_traj_sets, num_traj_samples, 64, 3]
- 64 waypoints at 10 Hz (6.4 second horizon)
- XYZ coordinates in the ego frame at t0
pred_rot: Shape [batch_size, num_traj_sets, num_traj_samples, 64, 3, 3]
- Rotation matrices for each waypoint

Extra Outputs (`extra`)

extra["cot"]: Chain-of-Causation reasoning traces
- Shape: [batch_size, num_traj_sets, num_traj_samples]
- Text descriptions of the driving reasoning process

Complete Example

Here’s the complete inference script from src/alpamayo_r1/test_inference.py:16-78:

import torch
import numpy as np

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper

# Example clip ID
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print(f"Loading dataset for clip_id: {clip_id}...")
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)
print("Dataset loaded.")
messages = helper.create_message(data["image_frames"].flatten(0, 1))

model = AlpamayoR1.from_pretrained("nvidia/Alpamayo-R1-10B", dtype=torch.bfloat16).to("cuda")
processor = helper.get_processor(model.tokenizer)

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)
model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}

model_inputs = helper.to_device(model_inputs, "cuda")

torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,
        max_generation_length=256,
        return_extra=True,
    )

print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()
print("minADE:", min_ade, "meters")

Get Started

Core Concepts

Guides

Model Components

Prerequisites

Quick Start with Test Script

Understanding the Inference Pipeline

1. Load the Dataset

2. Create Message Format

3. Load Model and Processor

4. Prepare Model Inputs

5. Run Inference

6. Evaluate Results

Generating Multiple Trajectories

Output Format

Predicted Trajectories (`pred_xyz`, `pred_rot`)

Extra Outputs (`extra`)

Complete Example

Next Steps

Notebook Tutorial

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Components

Documentation Index

​Prerequisites

​Quick Start with Test Script

​Understanding the Inference Pipeline

​1. Load the Dataset

​2. Create Message Format

​3. Load Model and Processor

​4. Prepare Model Inputs

​5. Run Inference

​6. Evaluate Results

​Generating Multiple Trajectories

​Output Format

​Predicted Trajectories (pred_xyz, pred_rot)

​Extra Outputs (extra)

​Complete Example

​Next Steps

Notebook Tutorial

Troubleshooting

Build docs developers (and LLMs) love

Prerequisites

Quick Start with Test Script

Understanding the Inference Pipeline

1. Load the Dataset

2. Create Message Format

3. Load Model and Processor

4. Prepare Model Inputs

5. Run Inference

6. Evaluate Results

Generating Multiple Trajectories

Output Format

Predicted Trajectories (`pred_xyz`, `pred_rot`)

Extra Outputs (`extra`)

Complete Example

Next Steps