CUDA Out-of-Memory Errors
Symptoms
Solutions
Reduce trajectory samples
Lower the
num_traj_samples parameter in your inference code:The test script uses
num_traj_samples=1 by default for GPU memory compatibility. You can increase this value on higher-memory GPUs.Tested Hardware Configurations
The following GPUs have been successfully tested:| GPU Model | VRAM | Status |
|---|---|---|
| RTX 3090 | 24 GB | ✅ Tested |
| RTX 4090 | 24 GB | ✅ Tested |
| A5000 | 24 GB | ✅ Tested |
| A100 | 40/80 GB | ✅ Tested |
| H100 | 80 GB | ✅ Tested |
| RTX 3060 | 12 GB | ❌ Insufficient memory |
| RTX 3070 | 8 GB | ❌ Insufficient memory |
Flash Attention Issues
Symptoms
Solution
The model uses Flash Attention 2 by default for improved performance. If you encounter compatibility issues with your GPU architecture:Flash Attention 2 requires NVIDIA GPUs with compute capability ≥ 8.0 (Ampere or newer: RTX 30-series, A-series, H-series). Older architectures should use “sdpa” or “eager”.
HuggingFace Authentication Issues
Symptom 1: Access Denied
Solution: Authenticate and Request Access
Request access to gated resources
Visit these pages and request access:
Create a HuggingFace access token
- Go to HuggingFace Settings > Tokens
- Click “New token”
- Select “Read” permissions
- Copy the generated token
Symptom 2: Token Not Found
Solution: Set Token Environment Variable
Installation Issues
Python Version Mismatch
Symptom:Dependency Conflicts
Symptom:uv for dependency management as recommended:
uv provides faster, more reliable dependency resolution compared to pip.Model Download Issues
Slow Download Speed
Symptom: Model weights (22 GB) are downloading very slowly. Solutions:Use a wired connection
Use a wired connection
For reference, downloads take approximately 2.5 minutes on a 100 MB/s wired connection. WiFi connections may be significantly slower.
Resume interrupted downloads
Resume interrupted downloads
HuggingFace Hub automatically resumes interrupted downloads. If download fails, simply run the script again:
Use HuggingFace CLI for manual download
Use HuggingFace CLI for manual download
Check disk space
Check disk space
Ensure you have at least 30 GB of free disk space:
Download Verification Failed
Symptom:Runtime Errors
Dataset Loading Errors
Symptom:Non-Deterministic Output Variance
Symptom: Getting different minADE values across runs. Explanation: This is expected behavior. Fromsrc/alpamayo_r1/test_inference.py:73-77:
Use the notebook for visual validation
See the notebook tutorial for trajectory visualization to qualitatively assess model performance.
Performance Issues
Slow Inference Speed
Symptoms: Inference takes longer than expected. Optimizations:Use bfloat16 precision
Use bfloat16 precision
Enable autocast
Enable autocast
Use Flash Attention if supported
Use Flash Attention if supported
Reduce max_generation_length
Reduce max_generation_length
Getting Help
If you encounter issues not covered in this guide:Model Card
Check the HuggingFace Model Card for comprehensive model details
Paper
Read the research paper for architectural and methodological details
Dataset
Visit the PhysicalAI-AV Dataset page for data-related questions
GitHub Issues
Open an issue on the GitHub repository for bug reports
FAQ
Can I use this model on AMD GPUs or Apple Silicon?
Can I use this model on AMD GPUs or Apple Silicon?
No, the model requires NVIDIA CUDA-compatible GPUs. AMD and Apple Silicon are not currently supported.
What about inference on CPU?
What about inference on CPU?
CPU inference is not recommended due to:
- Extremely slow performance (hours per sample)
- High memory requirements (>64 GB RAM)
- Lack of optimization for CPU execution
Can I quantize the model to reduce memory usage?
Can I quantize the model to reduce memory usage?
While technically possible, the released model has been tested with bfloat16 precision on 24+ GB GPUs. Quantization (int8, int4) is not officially supported and may degrade performance.
Why does my minADE differ from the paper?
Why does my minADE differ from the paper?
Several factors can cause differences:
- Different random seeds
- Hardware variations
- Number of trajectory samples (
num_traj_samples) - This release doesn’t include RL post-training from the paper