This guide walks you through running inference with Alpamayo 1 to generate trajectory predictions and Chain-of-Causation reasoning traces.
Prerequisites
Before running inference, ensure you have:
Completed the installation process
At least 24 GB GPU VRAM (e.g., RTX 3090, RTX 4090, A5000, H100)
Authenticated with HuggingFace and gained access to the model and dataset
GPUs with less than 24 GB VRAM will likely encounter CUDA out-of-memory errors during inference.
Quick Start with Test Script
The fastest way to test inference is using the provided test script:
Navigate to the repository
Run the test inference script
python src/alpamayo_r1/test_inference.py
This script will download example data (relatively small) and model weights (22 GB). Download time depends on your network bandwidth - approximately 2.5 minutes on a 100 MB/s connection.
Review the output
The script will output:
Chain-of-Causation reasoning traces
Minimum Average Displacement Error (minADE) in meters
A note about output variance due to trajectory sampling
Understanding the Inference Pipeline
The inference pipeline consists of several key steps:
1. Load the Dataset
Load a clip from the PhysicalAI-AV dataset:
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
# Example clip ID
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print ( f "Loading dataset for clip_id: { clip_id } ..." )
data = load_physical_aiavdataset(clip_id, t0_us = 5_100_000 )
print ( "Dataset loaded." )
The load_physical_aiavdataset function returns a dictionary containing:
image_frames: Multi-camera video frames (N_cameras, num_frames, 3, H, W)
ego_history_xyz: Historical ego trajectory positions
ego_history_rot: Historical ego trajectory rotations
ego_future_xyz: Ground truth future trajectory positions
ego_future_rot: Ground truth future trajectory rotations
Construct the input message using the image frames:
from alpamayo_r1 import helper
messages = helper.create_message(data[ "image_frames" ].flatten( 0 , 1 ))
The message format includes:
System prompt: “You are a driving assistant that generates safe and accurate actions.”
User input: Multi-camera images and trajectory history
Assistant response prompt: Starts with Chain-of-Causation reasoning
3. Load Model and Processor
Load the Alpamayo R1 model and create the data processor:
import torch
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
model = AlpamayoR1.from_pretrained(
"nvidia/Alpamayo-R1-10B" ,
dtype = torch.bfloat16
).to( "cuda" )
processor = helper.get_processor(model.tokenizer)
The model uses torch.bfloat16 precision for efficient inference on modern GPUs.
Tokenize and prepare the inputs for the model:
inputs = processor.apply_chat_template(
messages,
tokenize = True ,
add_generation_prompt = False ,
continue_final_message = True ,
return_dict = True ,
return_tensors = "pt" ,
)
model_inputs = {
"tokenized_data" : inputs,
"ego_history_xyz" : data[ "ego_history_xyz" ],
"ego_history_rot" : data[ "ego_history_rot" ],
}
model_inputs = helper.to_device(model_inputs, "cuda" )
5. Run Inference
Generate trajectory predictions with Chain-of-Causation reasoning:
torch.cuda.manual_seed_all( 42 )
with torch.autocast( "cuda" , dtype = torch.bfloat16):
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
data = model_inputs,
top_p = 0.98 ,
temperature = 0.6 ,
num_traj_samples = 1 , # Increase for more trajectories
max_generation_length = 256 ,
return_extra = True ,
)
# View Chain-of-Causation reasoning
print ( "Chain-of-Causation (per trajectory): \n " , extra[ "cot" ][ 0 ])
Inference Parameters Explained
top_p: Nucleus sampling parameter (0.98 for diverse outputs)
temperature: Sampling temperature (0.6 for balanced randomness)
num_traj_samples: Number of trajectory samples to generate
max_generation_length: Maximum tokens for reasoning generation
return_extra: Return additional outputs like CoC traces
6. Evaluate Results
Compute the minimum Average Displacement Error (minADE):
import numpy as np
gt_xy = data[ "ego_future_xyz" ].cpu()[ 0 , 0 , :, : 2 ].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[ 0 , 0 , :, :, : 2 ].transpose( 0 , 2 , 1 )
diff = np.linalg.norm(pred_xy - gt_xy[ None , ... ], axis = 1 ).mean( - 1 )
min_ade = diff.min()
print ( "minADE:" , min_ade, "meters" )
Generating Multiple Trajectories
To generate multiple trajectory samples and reasoning traces, increase the num_traj_samples parameter:
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
data = model_inputs,
top_p = 0.98 ,
temperature = 0.6 ,
num_traj_samples = 5 , # Generate 5 different trajectories
max_generation_length = 256 ,
return_extra = True ,
)
# Each trajectory has its own CoC reasoning
for i, cot in enumerate (extra[ "cot" ][ 0 ]):
print ( f " \n Trajectory { i + 1 } reasoning: \n " , cot)
Increasing num_traj_samples requires more GPU memory. If you encounter OOM errors, reduce this value or see the troubleshooting guide .
The model returns three outputs:
Predicted Trajectories (pred_xyz, pred_rot)
pred_xyz: Shape [batch_size, num_traj_sets, num_traj_samples, 64, 3]
64 waypoints at 10 Hz (6.4 second horizon)
XYZ coordinates in the ego frame at t0
pred_rot: Shape [batch_size, num_traj_sets, num_traj_samples, 64, 3, 3]
Rotation matrices for each waypoint
extra["cot"]: Chain-of-Causation reasoning traces
Shape: [batch_size, num_traj_sets, num_traj_samples]
Text descriptions of the driving reasoning process
Complete Example
Here’s the complete inference script from src/alpamayo_r1/test_inference.py:16-78:
import torch
import numpy as np
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
from alpamayo_r1 import helper
# Example clip ID
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
print ( f "Loading dataset for clip_id: { clip_id } ..." )
data = load_physical_aiavdataset(clip_id, t0_us = 5_100_000 )
print ( "Dataset loaded." )
messages = helper.create_message(data[ "image_frames" ].flatten( 0 , 1 ))
model = AlpamayoR1.from_pretrained( "nvidia/Alpamayo-R1-10B" , dtype = torch.bfloat16).to( "cuda" )
processor = helper.get_processor(model.tokenizer)
inputs = processor.apply_chat_template(
messages,
tokenize = True ,
add_generation_prompt = False ,
continue_final_message = True ,
return_dict = True ,
return_tensors = "pt" ,
)
model_inputs = {
"tokenized_data" : inputs,
"ego_history_xyz" : data[ "ego_history_xyz" ],
"ego_history_rot" : data[ "ego_history_rot" ],
}
model_inputs = helper.to_device(model_inputs, "cuda" )
torch.cuda.manual_seed_all( 42 )
with torch.autocast( "cuda" , dtype = torch.bfloat16):
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
data = model_inputs,
top_p = 0.98 ,
temperature = 0.6 ,
num_traj_samples = 1 ,
max_generation_length = 256 ,
return_extra = True ,
)
print ( "Chain-of-Causation (per trajectory): \n " , extra[ "cot" ][ 0 ])
gt_xy = data[ "ego_future_xyz" ].cpu()[ 0 , 0 , :, : 2 ].T.numpy()
pred_xy = pred_xyz.cpu().numpy()[ 0 , 0 , :, :, : 2 ].transpose( 0 , 2 , 1 )
diff = np.linalg.norm(pred_xy - gt_xy[ None , ... ], axis = 1 ).mean( - 1 )
min_ade = diff.min()
print ( "minADE:" , min_ade, "meters" )
Next Steps
Notebook Tutorial Explore interactive visualization and analysis
Troubleshooting Resolve common issues and errors