CUDA & GPU Acceleration

When CPU isn’t fast enough, warp-md has turbo mode. Optional CUDA acceleration for when your agent needs speed.

No GPU? No problem. Without CUDA, warp-md gracefully falls back to CPU. Your agent’s code runs everywhere.

Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit (driver + nvrtc)
CUDA_HOME or CUDA_PATH environment variable set

Setup

Verify CUDA Installation

nvcc --version
nvidia-smi

If these work, you’re halfway there.

Set Environment Variable

Linux/macOS
Windows

export CUDA_HOME=/usr/local/cuda

$env:CUDA_PATH = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0"

Build with CUDA Feature

cargo test -p traj-engine --features cuda

Verify GPU Detection

from warp_md import RgPlan
# device="auto" will use CUDA if available
# device="cuda" or "cuda:0" to force GPU

Your agent is now turbocharged.

Device Selection

All analysis plans accept a device parameter:

from warp_md import RgPlan

# Automatic (CUDA if available, else CPU)
rg = plan.run(traj, system, device="auto")

# Force CPU
rg = plan.run(traj, system, device="cpu")

# Force CUDA (first GPU)
rg = plan.run(traj, system, device="cuda")
rg = plan.run(traj, system, device="cuda:0")

What Gets Accelerated?

Category	GPU-Accelerated Analyses
Core	Rg, RMSD, MSD, RDF
Polymer	End-to-end, contour length, chain Rg, bond histograms
Orientation	RotAcf (orientation extraction)
Transport	Conductivity (group COM), Dielectric, Dipole alignment
Structure	Ion-pair correlation, Structure factor (RDF kernel)
Spatial	Water occupancy grid
Thermodynamic	Equipartition (group kinetic energy)
Bonding	H-bond counts (distance + angle)

CPU-only: Persistence length and some reductions remain CPU-based. Not everything parallelizes well.

Performance Tips

Chunk frames: For large trajectories, adjust chunk_frames to optimize memory transfer:

rg = plan.run(traj, system, device="cuda", chunk_frames=1000)

Multi-tau for long trajectories: Use bounded-memory mode for MSD/ACF:

msd = MsdPlan(selection, group_by="resid", lag_mode="multi_tau")

Troubleshooting

CUDA not detected

Set CUDA_HOME or CUDA_PATH to your CUDA installation:

export CUDA_HOME=/usr/local/cuda

The cudarc crate needs this breadcrumb to find the runtime.

GPU out of memory

Reduce chunk_frames or switch to CPU:

rg = plan.run(traj, system, device="cpu")

Sometimes discretion is the better part of valor.

Kernel compilation fails

Ensure nvrtc is available. This ships with the CUDA Toolkit.

Benchmarks

Typical speedups on modern GPUs (RTX 3080): | Analysis | CPU Time | GPU Time | Speedup | |----------|----------|----------|---------|| | Rg (10k frames) | 2.5s | 0.3s | ~8x | | RDF (10k frames) | 45s | 3s | ~15x | | MSD (100k frames) | 120s | 8s | ~15x |

Actual speedup depends on system size, trajectory length, and hardware. Your mileage may vary - but it’s usually impressive.

Feature Flags in Cargo.toml

CUDA support is enabled via the cuda feature flag in crates/traj-gpu/Cargo.toml:

[package]
name = "traj-gpu"
version = "0.1.5"
edition = "2021"
license = "MIT"

[dependencies]
traj-core = { path = "../traj-core" }
traj-kernels = { path = "../traj-kernels" }
cudarc = { version = "0.19", optional = true, features = ["driver", "nvrtc", "cuda-12040"] }

[features]
cuda = ["cudarc"]

Key points:

cudarc is an optional dependency
Feature flag: --features cuda
CUDA 12.0.40+ supported

Get Started

Core Concepts

Guides

AI Agents

CUDA & GPU Acceleration

Requirements

Setup

Device Selection

What Gets Accelerated?

Performance Tips

Troubleshooting

Benchmarks

Feature Flags in Cargo.toml

See Also

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

AI Agents

​Requirements

​Setup

​Device Selection

​What Gets Accelerated?

​Performance Tips

​Troubleshooting

​Benchmarks

​Feature Flags in Cargo.toml

​See Also

Build docs developers (and LLMs) love

Requirements

Setup

Device Selection

What Gets Accelerated?

Performance Tips

Troubleshooting

Benchmarks

Feature Flags in Cargo.toml

See Also