Skip to main content
When CPU isn’t fast enough, warp-md has turbo mode. Optional CUDA acceleration for when your agent needs speed.
No GPU? No problem. Without CUDA, warp-md gracefully falls back to CPU. Your agent’s code runs everywhere.

Requirements

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit (driver + nvrtc)
  • CUDA_HOME or CUDA_PATH environment variable set

Setup

1

Verify CUDA Installation

nvcc --version
nvidia-smi
If these work, you’re halfway there.
2

Set Environment Variable

export CUDA_HOME=/usr/local/cuda
3

Build with CUDA Feature

cargo test -p traj-engine --features cuda
4

Verify GPU Detection

from warp_md import RgPlan
# device="auto" will use CUDA if available
# device="cuda" or "cuda:0" to force GPU
Your agent is now turbocharged.

Device Selection

All analysis plans accept a device parameter:
from warp_md import RgPlan

# Automatic (CUDA if available, else CPU)
rg = plan.run(traj, system, device="auto")

# Force CPU
rg = plan.run(traj, system, device="cpu")

# Force CUDA (first GPU)
rg = plan.run(traj, system, device="cuda")
rg = plan.run(traj, system, device="cuda:0")

What Gets Accelerated?

CategoryGPU-Accelerated Analyses
CoreRg, RMSD, MSD, RDF
PolymerEnd-to-end, contour length, chain Rg, bond histograms
OrientationRotAcf (orientation extraction)
TransportConductivity (group COM), Dielectric, Dipole alignment
StructureIon-pair correlation, Structure factor (RDF kernel)
SpatialWater occupancy grid
ThermodynamicEquipartition (group kinetic energy)
BondingH-bond counts (distance + angle)
CPU-only: Persistence length and some reductions remain CPU-based. Not everything parallelizes well.

Performance Tips

Chunk frames: For large trajectories, adjust chunk_frames to optimize memory transfer:
rg = plan.run(traj, system, device="cuda", chunk_frames=1000)
Multi-tau for long trajectories: Use bounded-memory mode for MSD/ACF:
msd = MsdPlan(selection, group_by="resid", lag_mode="multi_tau")

Troubleshooting

Set CUDA_HOME or CUDA_PATH to your CUDA installation:
export CUDA_HOME=/usr/local/cuda
The cudarc crate needs this breadcrumb to find the runtime.
Reduce chunk_frames or switch to CPU:
rg = plan.run(traj, system, device="cpu")
Sometimes discretion is the better part of valor.
Ensure nvrtc is available. This ships with the CUDA Toolkit.

Benchmarks

Typical speedups on modern GPUs (RTX 3080): | Analysis | CPU Time | GPU Time | Speedup | |----------|----------|----------|---------|| | Rg (10k frames) | 2.5s | 0.3s | ~8x | | RDF (10k frames) | 45s | 3s | ~15x | | MSD (100k frames) | 120s | 8s | ~15x |
Actual speedup depends on system size, trajectory length, and hardware. Your mileage may vary - but it’s usually impressive.

Feature Flags in Cargo.toml

CUDA support is enabled via the cuda feature flag in crates/traj-gpu/Cargo.toml:
[package]
name = "traj-gpu"
version = "0.1.5"
edition = "2021"
license = "MIT"

[dependencies]
traj-core = { path = "../traj-core" }
traj-kernels = { path = "../traj-kernels" }
cudarc = { version = "0.19", optional = true, features = ["driver", "nvrtc", "cuda-12040"] }

[features]
cuda = ["cudarc"]
Key points:
  • cudarc is an optional dependency
  • Feature flag: --features cuda
  • CUDA 12.0.40+ supported

See Also

Build docs developers (and LLMs) love