Troubleshooting

This guide helps you resolve common issues when working with Alpamayo 1.

CUDA Out-of-Memory Errors

Symptoms

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X.XX GiB

Solutions

Verify GPU requirements

Ensure you have at least 24 GB VRAM. Check your GPU memory:

nvidia-smi

GPUs with less than 24 GB VRAM (e.g., RTX 3060, RTX 3070, or most consumer GPUs with 16 GB or less) will likely encounter OOM errors during inference.

Reduce trajectory samples

Lower the num_traj_samples parameter in your inference code:

# Before (may cause OOM)
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=5,  # Generates 5 trajectories
    ...
)

# After (more memory efficient)
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=1,  # Generates 1 trajectory
    ...
)

The test script uses num_traj_samples=1 by default for GPU memory compatibility. You can increase this value on higher-memory GPUs.

Close other GPU applications

Free up GPU memory by closing other applications:

# Check GPU processes
nvidia-smi

# Kill a specific process (replace PID with actual process ID)
kill -9 PID

Clear GPU cache

In Python/Jupyter, clear the CUDA cache:

import torch
torch.cuda.empty_cache()

For persistent issues, restart your Python kernel or terminal session.

Tested Hardware Configurations

The following GPUs have been successfully tested:

GPU Model	VRAM	Status
RTX 3090	24 GB	✅ Tested
RTX 4090	24 GB	✅ Tested
A5000	24 GB	✅ Tested
A100	40/80 GB	✅ Tested
H100	80 GB	✅ Tested
RTX 3060	12 GB	❌ Insufficient memory
RTX 3070	8 GB	❌ Insufficient memory

Flash Attention Issues

Symptoms

ImportError: Flash Attention is not installed or incompatible with your system

RuntimeError: FlashAttention only supports Ampere GPUs or newer

Solution

The model uses Flash Attention 2 by default for improved performance. If you encounter compatibility issues with your GPU architecture:

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
import torch

# Load model with PyTorch's scaled dot-product attention instead
model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16,
    attn_implementation="sdpa"  # Use PyTorch native attention
).to("cuda")

Flash Attention 2 requires NVIDIA GPUs with compute capability ≥ 8.0 (Ampere or newer: RTX 30-series, A-series, H-series). Older architectures should use “sdpa” or “eager”.

HuggingFace Authentication Issues

Symptom 1: Access Denied

HTTPError: 401 Client Error: Unauthorized for url

Solution: Authenticate and Request Access

Request access to gated resources

Visit these pages and request access:

Access approval may take some time. You’ll receive an email notification when approved.

Create a HuggingFace access token

Go to HuggingFace Settings > Tokens
Click “New token”
Select “Read” permissions
Copy the generated token

Authenticate with the CLI

# Install HuggingFace CLI if needed
pip install huggingface_hub

# Login with your token
huggingface-cli login

Paste your token when prompted.

Verify authentication

huggingface-cli whoami

This should display your HuggingFace username.

Symptom 2: Token Not Found

OSError: You are trying to access a gated repo.

Solution: Set Token Environment Variable

# Set token as environment variable
export HF_TOKEN="your_token_here"

# Or use in Python
import os
os.environ["HF_TOKEN"] = "your_token_here"

Alternatively, pass the token directly:

from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1

model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16,
    token="your_token_here"  # Pass token explicitly
).to("cuda")

Installation Issues

Python Version Mismatch

Symptom:

Requires-Python >=3.12,<3.13

Solution: Alpamayo 1 requires Python 3.12.x specifically. Check your version:

python --version

If you have a different version, install Python 3.12:

# uv automatically manages Python versions
uv venv ar1_venv --python 3.12
source ar1_venv/bin/activate
uv sync --active

Dependency Conflicts

Symptom:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.

Solution: Use uv for dependency management as recommended:

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Create fresh environment
uv venv ar1_venv
source ar1_venv/bin/activate
uv sync --active

uv provides faster, more reliable dependency resolution compared to pip.

Model Download Issues

Slow Download Speed

Symptom: Model weights (22 GB) are downloading very slowly. Solutions:

Use a wired connection

For reference, downloads take approximately 2.5 minutes on a 100 MB/s wired connection. WiFi connections may be significantly slower.

Resume interrupted downloads

HuggingFace Hub automatically resumes interrupted downloads. If download fails, simply run the script again:

python src/alpamayo_r1/test_inference.py

Use HuggingFace CLI for manual download

# Download model weights manually
huggingface-cli download nvidia/Alpamayo-R1-10B

Check disk space

Ensure you have at least 30 GB of free disk space:

df -h

Download Verification Failed

Symptom:

OSError: Unable to load weights from checkpoint

Solution: Clear the cached download and retry:

# Remove cached model files
rm -rf ~/.cache/huggingface/hub/models--nvidia--Alpamayo-R1-10B

# Re-download
python src/alpamayo_r1/test_inference.py

Runtime Errors

Dataset Loading Errors

Symptom:

KeyError: clip_id not found in dataset

Solution: Ensure you’re using a valid clip ID from the dataset:

from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

# Use the example clip ID from the test script
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)

Or load from the curated list:

import pandas as pd

clip_ids = pd.read_parquet("clip_ids.parquet")["clip_id"].tolist()
clip_id = clip_ids[0]  # Use first clip

Non-Deterministic Output Variance

Symptom: Getting different minADE values across runs. Explanation: This is expected behavior. From src/alpamayo_r1/test_inference.py:73-77:

print(
    "Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, "
    "hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), "
    "variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb"
)

Mitigation:

Set random seeds

import torch
torch.cuda.manual_seed_all(42)

Note: This improves but doesn’t guarantee exact reproducibility.

Increase trajectory samples

Generate multiple trajectories and use the best:

pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=5,  # Generate multiple samples
    ...
)

Use the notebook for visual validation

See the notebook tutorial for trajectory visualization to qualitatively assess model performance.

Performance Issues

Slow Inference Speed

Symptoms: Inference takes longer than expected. Optimizations:

Use bfloat16 precision

# Always use bfloat16 for optimal performance
model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16  # Not float16 or float32
).to("cuda")

Enable autocast

with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(...)

Use Flash Attention if supported

# Default behavior (fastest if supported)
model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16
    # attn_implementation defaults to "flash_attention_2"
).to("cuda")

Reduce max_generation_length

# Shorter reasoning traces = faster inference
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    max_generation_length=128,  # Reduced from 256
    ...
)

Getting Help

If you encounter issues not covered in this guide:

Model Card

Check the HuggingFace Model Card for comprehensive model details

Paper

Read the research paper for architectural and methodological details

Dataset

Visit the PhysicalAI-AV Dataset page for data-related questions

GitHub Issues

Open an issue on the GitHub repository for bug reports

FAQ

Can I use this model on AMD GPUs or Apple Silicon?

No, the model requires NVIDIA CUDA-compatible GPUs. AMD and Apple Silicon are not currently supported.

What about inference on CPU?

CPU inference is not recommended due to:

Extremely slow performance (hours per sample)
High memory requirements (>64 GB RAM)
Lack of optimization for CPU execution

Can I quantize the model to reduce memory usage?

While technically possible, the released model has been tested with bfloat16 precision on 24+ GB GPUs. Quantization (int8, int4) is not officially supported and may degrade performance.

Why does my minADE differ from the paper?

Several factors can cause differences:

Different random seeds
Hardware variations
Number of trajectory samples (num_traj_samples)
This release doesn’t include RL post-training from the paper

See “Non-Deterministic Output Variance” above for details.

Get Started

Core Concepts

Guides

Model Components

CUDA Out-of-Memory Errors

Symptoms

Solutions

Tested Hardware Configurations

Flash Attention Issues

Symptoms

Solution

HuggingFace Authentication Issues

Symptom 1: Access Denied

Solution: Authenticate and Request Access

Symptom 2: Token Not Found

Solution: Set Token Environment Variable

Installation Issues

Python Version Mismatch

Dependency Conflicts

Model Download Issues

Slow Download Speed

Download Verification Failed

Runtime Errors

Dataset Loading Errors

Non-Deterministic Output Variance

Performance Issues

Slow Inference Speed

Getting Help

Model Card

Paper

Dataset

GitHub Issues

FAQ

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Components

Documentation Index

​CUDA Out-of-Memory Errors

​Symptoms

​Solutions

​Tested Hardware Configurations

​Flash Attention Issues

​Symptoms

​Solution

​HuggingFace Authentication Issues

​Symptom 1: Access Denied

​Solution: Authenticate and Request Access

​Symptom 2: Token Not Found

​Solution: Set Token Environment Variable

​Installation Issues

​Python Version Mismatch

​Dependency Conflicts

​Model Download Issues

​Slow Download Speed

​Download Verification Failed

​Runtime Errors

​Dataset Loading Errors

​Non-Deterministic Output Variance

​Performance Issues

​Slow Inference Speed

​Getting Help

Model Card

Paper

Dataset

GitHub Issues

​FAQ

Build docs developers (and LLMs) love

CUDA Out-of-Memory Errors

Symptoms

Solutions

Tested Hardware Configurations

Flash Attention Issues

Symptoms

Solution

HuggingFace Authentication Issues

Symptom 1: Access Denied

Solution: Authenticate and Request Access

Symptom 2: Token Not Found

Solution: Set Token Environment Variable

Installation Issues

Python Version Mismatch

Dependency Conflicts

Model Download Issues

Slow Download Speed

Download Verification Failed

Runtime Errors

Dataset Loading Errors

Non-Deterministic Output Variance

Performance Issues

Slow Inference Speed

Getting Help

FAQ