cudaz is a Zig library that lets you harness NVIDIA GPU hardware directly from Zig programs. Instead of wrestling with raw CUDA C APIs, cudaz exposes a clean, idiomatic Zig interface that handles device initialization, memory transfers, kernel compilation, and result retrieval — so you can focus on writing the GPU logic that makes your application fast.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt
Use this file to discover all available pages before exploring further.
GPU Device Management
Initialize CUDA devices by ordinal or grab the default GPU with a single call. Manage device lifetimes safely using Zig’s
defer pattern.Type-Safe GPU Memory
CudaSlice(T) wraps device pointers with compile-time type information, preventing entire classes of pointer and size mismatch bugs.NVRTC Kernel Compilation
Compile CUDA C kernels from inline strings or
.cu files at runtime using NVIDIA’s NVRTC library — no separate nvcc build step required.PTX Loading and Execution
Load pre-compiled PTX assembly from text or a file path, then launch functions with full control over grid dimensions, block dimensions, and shared memory.
cuRAND Random Number Generation
Generate random numbers directly on the GPU using NVIDIA’s cuRAND library, without copying seed data back and forth between host and device.
Automatic CUDA Path Detection
cudaz scans standard installation directories (
/usr, /usr/local/cuda, /opt/cuda, /usr/lib/cuda) so your builds work out of the box on most Linux and macOS setups.Custom C Struct Support
Pass your own C-compatible structs to GPU kernels through
CAPI, enabling complex data layouts beyond simple numeric arrays.Typed Error Handling
CUDA and NVRTC error codes are mapped to Zig error unions (
CudaError), giving you exhaustive, type-safe error handling at every GPU call site.How it works
A typical cudaz program follows a straightforward five-step workflow:- Device init — call
CuDevice.default()(orCuDevice.new(ordinal)) to initialize the CUDA runtime and acquire a primary context for your GPU. - Memory allocation — use
device.htodCopy(T, &host_slice)to allocate GPU memory and copy your data to the device in one step, receiving aCudaSlice(T)back. - Kernel compile — write your CUDA C kernel as an inline Zig string (or embed a
.cufile), then callCuCompile.cudaText(source, .{}, allocator)to produce PTX bytecode via NVRTC. - Kernel run — load the PTX with
CuDevice.loadPtxText(ptx), look up your function by name withmodule.getFunc("name"), then launch it viafunction.run(params, LaunchConfig{...}). - Data retrieval — call
CuDevice.syncReclaim(T, allocator, cu_slice)to synchronize the device, copy results back to the host, and return them as astd.ArrayList(T).
cudaz supports Linux and macOS only. Windows is explicitly not supported — the build system will return an error if you attempt to build on Windows.
cudaz was inspired by cudarc, the high-level Rust CUDA library by Corey Lowman. If you are coming from a Rust background, many of the design patterns will feel familiar.