Skip to main content
This guide walks you through running inference with Alpamayo 1 to generate trajectory predictions and Chain-of-Causation reasoning traces.

Prerequisites

Before running inference, ensure you have:
  • Completed the installation process
  • At least 24 GB GPU VRAM (e.g., RTX 3090, RTX 4090, A5000, H100)
  • Authenticated with HuggingFace and gained access to the model and dataset
GPUs with less than 24 GB VRAM will likely encounter CUDA out-of-memory errors during inference.

Quick Start with Test Script

The fastest way to test inference is using the provided test script:
1

Navigate to the repository

cd alpamayo
2

Run the test inference script

python src/alpamayo_r1/test_inference.py
This script will download example data (relatively small) and model weights (22 GB). Download time depends on your network bandwidth - approximately 2.5 minutes on a 100 MB/s connection.
3

Review the output

The script will output:
  • Chain-of-Causation reasoning traces
  • Minimum Average Displacement Error (minADE) in meters
  • A note about output variance due to trajectory sampling

Understanding the Inference Pipeline

The inference pipeline consists of several key steps:

1. Load the Dataset

Load a clip from the PhysicalAI-AV dataset:
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

# Example clip ID
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print(f"Loading dataset for clip_id: {clip_id}...")
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)
print("Dataset loaded.")
The load_physical_aiavdataset function returns a dictionary containing:
  • image_frames: Multi-camera video frames (N_cameras, num_frames, 3, H, W)
  • ego_history_xyz: Historical ego trajectory positions
  • ego_history_rot: Historical ego trajectory rotations
  • ego_future_xyz: Ground truth future trajectory positions
  • ego_future_rot: Ground truth future trajectory rotations

2. Create Message Format

Construct the input message using the image frames:
from alpamayo_r1 import helper

messages = helper.create_message(data["image_frames"].flatten(0, 1))
The message format includes:
  • System prompt: “You are a driving assistant that generates safe and accurate actions.”
  • User input: Multi-camera images and trajectory history
  • Assistant response prompt: Starts with Chain-of-Causation reasoning

3. Load Model and Processor

Load the Alpamayo R1 model and create the data processor:
import torch
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1

model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B", 
    dtype=torch.bfloat16
).to("cuda")

processor = helper.get_processor(model.tokenizer)
The model uses torch.bfloat16 precision for efficient inference on modern GPUs.

4. Prepare Model Inputs

Tokenize and prepare the inputs for the model:
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)

model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}

model_inputs = helper.to_device(model_inputs, "cuda")

5. Run Inference

Generate trajectory predictions with Chain-of-Causation reasoning:
torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,  # Increase for more trajectories
        max_generation_length=256,
        return_extra=True,
    )

# View Chain-of-Causation reasoning
print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])
  • top_p: Nucleus sampling parameter (0.98 for diverse outputs)
  • temperature: Sampling temperature (0.6 for balanced randomness)
  • num_traj_samples: Number of trajectory samples to generate
  • max_generation_length: Maximum tokens for reasoning generation
  • return_extra: Return additional outputs like CoC traces

6. Evaluate Results

Compute the minimum Average Displacement Error (minADE):
import numpy as np

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()

print("minADE:", min_ade, "meters")

Generating Multiple Trajectories

To generate multiple trajectory samples and reasoning traces, increase the num_traj_samples parameter:
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    top_p=0.98,
    temperature=0.6,
    num_traj_samples=5,  # Generate 5 different trajectories
    max_generation_length=256,
    return_extra=True,
)

# Each trajectory has its own CoC reasoning
for i, cot in enumerate(extra["cot"][0]):
    print(f"\nTrajectory {i+1} reasoning:\n", cot)
Increasing num_traj_samples requires more GPU memory. If you encounter OOM errors, reduce this value or see the troubleshooting guide.

Output Format

The model returns three outputs:

Predicted Trajectories (pred_xyz, pred_rot)

  • pred_xyz: Shape [batch_size, num_traj_sets, num_traj_samples, 64, 3]
    • 64 waypoints at 10 Hz (6.4 second horizon)
    • XYZ coordinates in the ego frame at t0
  • pred_rot: Shape [batch_size, num_traj_sets, num_traj_samples, 64, 3, 3]
    • Rotation matrices for each waypoint

Extra Outputs (extra)

  • extra["cot"]: Chain-of-Causation reasoning traces
    • Shape: [batch_size, num_traj_sets, num_traj_samples]
    • Text descriptions of the driving reasoning process

Complete Example

Here’s the complete inference script from src/alpamayo_r1/test_inference.py:16-78:
import torch
import numpy as np

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper

# Example clip ID
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print(f"Loading dataset for clip_id: {clip_id}...")
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)
print("Dataset loaded.")
messages = helper.create_message(data["image_frames"].flatten(0, 1))

model = AlpamayoR1.from_pretrained("nvidia/Alpamayo-R1-10B", dtype=torch.bfloat16).to("cuda")
processor = helper.get_processor(model.tokenizer)

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)
model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}

model_inputs = helper.to_device(model_inputs, "cuda")

torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,
        max_generation_length=256,
        return_extra=True,
    )

print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()
print("minADE:", min_ade, "meters")

Next Steps

Notebook Tutorial

Explore interactive visualization and analysis

Troubleshooting

Resolve common issues and errors

Build docs developers (and LLMs) love