Skip to main content
This guide helps you resolve common issues when working with Alpamayo 1.

CUDA Out-of-Memory Errors

Symptoms

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X.XX GiB

Solutions

1

Verify GPU requirements

Ensure you have at least 24 GB VRAM. Check your GPU memory:
nvidia-smi
GPUs with less than 24 GB VRAM (e.g., RTX 3060, RTX 3070, or most consumer GPUs with 16 GB or less) will likely encounter OOM errors during inference.
2

Reduce trajectory samples

Lower the num_traj_samples parameter in your inference code:
# Before (may cause OOM)
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=5,  # Generates 5 trajectories
    ...
)

# After (more memory efficient)
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=1,  # Generates 1 trajectory
    ...
)
The test script uses num_traj_samples=1 by default for GPU memory compatibility. You can increase this value on higher-memory GPUs.
3

Close other GPU applications

Free up GPU memory by closing other applications:
# Check GPU processes
nvidia-smi

# Kill a specific process (replace PID with actual process ID)
kill -9 PID
4

Clear GPU cache

In Python/Jupyter, clear the CUDA cache:
import torch
torch.cuda.empty_cache()
For persistent issues, restart your Python kernel or terminal session.

Tested Hardware Configurations

The following GPUs have been successfully tested:
GPU ModelVRAMStatus
RTX 309024 GB✅ Tested
RTX 409024 GB✅ Tested
A500024 GB✅ Tested
A10040/80 GB✅ Tested
H10080 GB✅ Tested
RTX 306012 GB❌ Insufficient memory
RTX 30708 GB❌ Insufficient memory

Flash Attention Issues

Symptoms

ImportError: Flash Attention is not installed or incompatible with your system
or
RuntimeError: FlashAttention only supports Ampere GPUs or newer

Solution

The model uses Flash Attention 2 by default for improved performance. If you encounter compatibility issues with your GPU architecture:
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
import torch

# Load model with PyTorch's scaled dot-product attention instead
model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16,
    attn_implementation="sdpa"  # Use PyTorch native attention
).to("cuda")
Flash Attention 2 requires NVIDIA GPUs with compute capability ≥ 8.0 (Ampere or newer: RTX 30-series, A-series, H-series). Older architectures should use “sdpa” or “eager”.

HuggingFace Authentication Issues

Symptom 1: Access Denied

HTTPError: 401 Client Error: Unauthorized for url

Solution: Authenticate and Request Access

1

Request access to gated resources

Visit these pages and request access:
Access approval may take some time. You’ll receive an email notification when approved.
2

Create a HuggingFace access token

  1. Go to HuggingFace Settings > Tokens
  2. Click “New token”
  3. Select “Read” permissions
  4. Copy the generated token
3

Authenticate with the CLI

# Install HuggingFace CLI if needed
pip install huggingface_hub

# Login with your token
huggingface-cli login
Paste your token when prompted.
4

Verify authentication

huggingface-cli whoami
This should display your HuggingFace username.

Symptom 2: Token Not Found

OSError: You are trying to access a gated repo.

Solution: Set Token Environment Variable

# Set token as environment variable
export HF_TOKEN="your_token_here"

# Or use in Python
import os
os.environ["HF_TOKEN"] = "your_token_here"
Alternatively, pass the token directly:
from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1

model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16,
    token="your_token_here"  # Pass token explicitly
).to("cuda")

Installation Issues

Python Version Mismatch

Symptom:
Requires-Python >=3.12,<3.13
Solution: Alpamayo 1 requires Python 3.12.x specifically. Check your version:
python --version
If you have a different version, install Python 3.12:
# uv automatically manages Python versions
uv venv ar1_venv --python 3.12
source ar1_venv/bin/activate
uv sync --active

Dependency Conflicts

Symptom:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
Solution: Use uv for dependency management as recommended:
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Create fresh environment
uv venv ar1_venv
source ar1_venv/bin/activate
uv sync --active
uv provides faster, more reliable dependency resolution compared to pip.

Model Download Issues

Slow Download Speed

Symptom: Model weights (22 GB) are downloading very slowly. Solutions:
For reference, downloads take approximately 2.5 minutes on a 100 MB/s wired connection. WiFi connections may be significantly slower.
HuggingFace Hub automatically resumes interrupted downloads. If download fails, simply run the script again:
python src/alpamayo_r1/test_inference.py
# Download model weights manually
huggingface-cli download nvidia/Alpamayo-R1-10B
Ensure you have at least 30 GB of free disk space:
df -h

Download Verification Failed

Symptom:
OSError: Unable to load weights from checkpoint
Solution: Clear the cached download and retry:
# Remove cached model files
rm -rf ~/.cache/huggingface/hub/models--nvidia--Alpamayo-R1-10B

# Re-download
python src/alpamayo_r1/test_inference.py

Runtime Errors

Dataset Loading Errors

Symptom:
KeyError: clip_id not found in dataset
Solution: Ensure you’re using a valid clip ID from the dataset:
from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset

# Use the example clip ID from the test script
clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
data = load_physical_aiavdataset(clip_id, t0_us=5_100_000)
Or load from the curated list:
import pandas as pd

clip_ids = pd.read_parquet("clip_ids.parquet")["clip_id"].tolist()
clip_id = clip_ids[0]  # Use first clip

Non-Deterministic Output Variance

Symptom: Getting different minADE values across runs. Explanation: This is expected behavior. From src/alpamayo_r1/test_inference.py:73-77:
print(
    "Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, "
    "hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), "
    "variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb"
)
Mitigation:
1

Set random seeds

import torch
torch.cuda.manual_seed_all(42)
Note: This improves but doesn’t guarantee exact reproducibility.
2

Increase trajectory samples

Generate multiple trajectories and use the best:
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    num_traj_samples=5,  # Generate multiple samples
    ...
)
3

Use the notebook for visual validation

See the notebook tutorial for trajectory visualization to qualitatively assess model performance.

Performance Issues

Slow Inference Speed

Symptoms: Inference takes longer than expected. Optimizations:
# Always use bfloat16 for optimal performance
model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16  # Not float16 or float32
).to("cuda")
with torch.autocast("cuda", dtype=torch.bfloat16):
    pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(...)
# Default behavior (fastest if supported)
model = AlpamayoR1.from_pretrained(
    "nvidia/Alpamayo-R1-10B",
    dtype=torch.bfloat16
    # attn_implementation defaults to "flash_attention_2"
).to("cuda")
# Shorter reasoning traces = faster inference
pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
    data=model_inputs,
    max_generation_length=128,  # Reduced from 256
    ...
)

Getting Help

If you encounter issues not covered in this guide:

Model Card

Check the HuggingFace Model Card for comprehensive model details

Paper

Read the research paper for architectural and methodological details

Dataset

Visit the PhysicalAI-AV Dataset page for data-related questions

GitHub Issues

Open an issue on the GitHub repository for bug reports

FAQ

No, the model requires NVIDIA CUDA-compatible GPUs. AMD and Apple Silicon are not currently supported.
CPU inference is not recommended due to:
  • Extremely slow performance (hours per sample)
  • High memory requirements (>64 GB RAM)
  • Lack of optimization for CPU execution
While technically possible, the released model has been tested with bfloat16 precision on 24+ GB GPUs. Quantization (int8, int4) is not officially supported and may degrade performance.
Several factors can cause differences:
  • Different random seeds
  • Hardware variations
  • Number of trajectory samples (num_traj_samples)
  • This release doesn’t include RL post-training from the paper
See “Non-Deterministic Output Variance” above for details.

Build docs developers (and LLMs) love