Quickstart

This guide shows you how to run inference with Alpamayo 1 and generate trajectory predictions with Chain-of-Causation reasoning.

Prerequisites

Before starting, ensure you have:

Completed the installation steps
Activated your virtual environment
Authenticated with HuggingFace
An NVIDIA GPU with ≥24 GB VRAM

Run the test inference script

The simplest way to get started is using the provided test script:

python src/alpamayo_r1/test_inference.py

The first run will download example data and model weights (22 GB). Subsequent runs will use cached weights.

This script will:

Load a sample clip from the PhysicalAI-AV dataset
Run inference to predict trajectories
Generate Chain-of-Causation reasoning traces
Compute the minADE (minimum Average Displacement Error) metric

Understanding the code

Here’s how the inference pipeline works:

Load the dataset

Load a specific clip from the PhysicalAI-AV dataset:

from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)

The dataset includes multi-camera images, ego vehicle history (position and rotation), and ground truth trajectories.

Load the model and processor

Load the pre-trained Alpamayo 1 model:

import torch
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1 import helper

model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B", 
    dtype=torch.bfloat16
).to("cuda")

processor = helper.get_processor(model.tokenizer)

The model uses bfloat16 precision for efficient GPU memory usage.

Prepare the inputs

Create message format and tokenize inputs:

messages = helper.create_message(data["image_frames"].flatten(0, 1))

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)

model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}
model_inputs = helper.to_device(model_inputs, "cuda")

Run inference

Generate trajectory predictions with Chain-of-Causation reasoning:

torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,
        max_generation_length=256,
        return_extra=True,
    )

# View Chain-of-Causation reasoning
print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

You can increase num_traj_samples to generate multiple trajectory hypotheses, but this requires more GPU memory.

Evaluate predictions

Compare predictions against ground truth:

import numpy as np

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()
print("minADE:", min_ade, "meters")

The minADE (minimum Average Displacement Error) measures the average distance between the best predicted trajectory and ground truth.

Complete example

Here’s the full inference script:

import torch
import numpy as np

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper

# Load dataset
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print(f"Loading dataset for clip_id: {clip_id}...")
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)
print("Dataset loaded.")
messages = helper.create_message(data["image_frames"].flatten(0, 1))

# Load model and processor
model = AlpamayoR1.from_pretrained("nvidia/Alpamayo-R1-10B", dtype=torch.bfloat16).to("cuda")
processor = helper.get_processor(model.tokenizer)

# Prepare inputs
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=False,
    continue_final_message=True,
    return_dict=True,
    return_tensors="pt",
)
model_inputs = {
    "tokenized_data": inputs,
    "ego_history_xyz": data["ego_history_xyz"],
    "ego_history_rot": data["ego_history_rot"],
}
model_inputs = helper.to_device(model_inputs, "cuda")

# Run inference
torch.cuda.manual_seed_all(42)
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
        data=model_inputs,
        top_p=0.98,
        temperature=0.6,
        num_traj_samples=1,
        max_generation_length=256,
        return_extra=True,
    )

# View reasoning and metrics
print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
min_ade = diff.min()
print("minADE:", min_ade, "meters")

Understanding the outputs

Alpamayo 1 produces two key outputs:

Trajectory predictions

Format: pred_xyz with shape [batch_size, num_traj_sets, num_traj_samples, 64, 3]
Content: 64 waypoints representing 6.4 seconds of predicted vehicle motion (10 Hz)
Coordinates: XYZ positions in the ego vehicle’s coordinate frame

Chain-of-Causation reasoning

Format: Natural language text in extra["cot"]
Content: Explanations of the causal factors influencing the predicted trajectory
Example: “The vehicle ahead is slowing down due to traffic. I should reduce speed and maintain safe following distance.”

Interactive notebook

For visual exploration and trajectory visualization, use the included Jupyter notebook:

jupyter notebook notebooks/inference.ipynb

The notebook includes:

Multi-camera image visualization
Trajectory plotting (predicted vs. ground truth)
Interactive parameter tuning
Matplotlib-based visualizations

Inference parameters

You can customize inference behavior with these parameters:

Parameter	Default	Description
`top_p`	0.98	Nucleus sampling threshold for token generation
`temperature`	0.6	Sampling temperature (higher = more diverse)
`num_traj_samples`	1	Number of trajectory samples to generate
`max_generation_length`	256	Maximum length for reasoning text generation

Increasing num_traj_samples generates multiple trajectory hypotheses but significantly increases GPU memory usage. Start with 1 and increase gradually.

Expected variability

Vision-Language-Action models produce non-deterministic outputs due to:

Trajectory sampling during inference
Hardware differences across GPUs
Floating-point precision variations

With num_traj_samples=1, you may observe variance in minADE metrics across runs. This is expected behavior. For more stable evaluation, increase num_traj_samples or use the interactive notebook for visual sanity checks.

Next steps

Model architecture

Learn about the Vision-Language-Action architecture and Chain-of-Causation reasoning

HuggingFace model card

Read comprehensive details on inputs, outputs, and licensing

Research paper

Explore the technical details in the arXiv paper

Dataset

Browse the PhysicalAI-AV dataset on HuggingFace

Get Started

Core Concepts

Guides

Model Components

Quickstart

Quickstart

Prerequisites

Run the test inference script

Understanding the code

Complete example

Understanding the outputs

Trajectory predictions

Chain-of-Causation reasoning

Interactive notebook

Inference parameters

Expected variability

Next steps

Model architecture

HuggingFace model card

Research paper

Dataset

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Components

Documentation Index

​Quickstart

​Prerequisites

​Run the test inference script

​Understanding the code

​Complete example

​Understanding the outputs

​Trajectory predictions

​Chain-of-Causation reasoning

​Interactive notebook

​Inference parameters

​Expected variability

​Next steps

Model architecture

HuggingFace model card

Research paper

Dataset

Build docs developers (and LLMs) love

Quickstart

Prerequisites

Run the test inference script

Understanding the code

Complete example

Understanding the outputs

Trajectory predictions

Chain-of-Causation reasoning

Interactive notebook

Inference parameters

Expected variability

Next steps