cudaz: High-Level CUDA GPU Programming Library for Zig

cudaz is a Zig library that lets you harness NVIDIA GPU hardware directly from Zig programs. Instead of wrestling with raw CUDA C APIs, cudaz exposes a clean, idiomatic Zig interface that handles device initialization, memory transfers, kernel compilation, and result retrieval — so you can focus on writing the GPU logic that makes your application fast.

GPU Device Management

Initialize CUDA devices by ordinal or grab the default GPU with a single call. Manage device lifetimes safely using Zig’s defer pattern.

Type-Safe GPU Memory

CudaSlice(T) wraps device pointers with compile-time type information, preventing entire classes of pointer and size mismatch bugs.

NVRTC Kernel Compilation

Compile CUDA C kernels from inline strings or .cu files at runtime using NVIDIA’s NVRTC library — no separate nvcc build step required.

PTX Loading and Execution

Load pre-compiled PTX assembly from text or a file path, then launch functions with full control over grid dimensions, block dimensions, and shared memory.

cuRAND Random Number Generation

Generate random numbers directly on the GPU using NVIDIA’s cuRAND library, without copying seed data back and forth between host and device.

Automatic CUDA Path Detection

cudaz scans standard installation directories (/usr, /usr/local/cuda, /opt/cuda, /usr/lib/cuda) so your builds work out of the box on most Linux and macOS setups.

Custom C Struct Support

Pass your own C-compatible structs to GPU kernels through CAPI, enabling complex data layouts beyond simple numeric arrays.

Typed Error Handling

CUDA and NVRTC error codes are mapped to Zig error unions (CudaError), giving you exhaustive, type-safe error handling at every GPU call site.

How it works

A typical cudaz program follows a straightforward five-step workflow:

Device init — call CuDevice.default() (or CuDevice.new(ordinal)) to initialize the CUDA runtime and acquire a primary context for your GPU.
Memory allocation — use device.htodCopy(T, &host_slice) to allocate GPU memory and copy your data to the device in one step, receiving a CudaSlice(T) back.
Kernel compile — write your CUDA C kernel as an inline Zig string (or embed a .cu file), then call CuCompile.cudaText(source, .{}, allocator) to produce PTX bytecode via NVRTC.
Kernel run — load the PTX with CuDevice.loadPtxText(ptx), look up your function by name with module.getFunc("name"), then launch it via function.run(params, LaunchConfig{...}).
Data retrieval — call CuDevice.syncReclaim(T, allocator, cu_slice) to synchronize the device, copy results back to the host, and return them as a std.ArrayList(T).

cudaz supports Linux and macOS only. Windows is explicitly not supported — the build system will return an error if you attempt to build on Windows.

cudaz was inspired by cudarc, the high-level Rust CUDA library by Corey Lowman. If you are coming from a Rust background, many of the design patterns will feel familiar.

Getting Started

Core Concepts

Guides

Examples

cudaz: High-Level CUDA GPU Programming Library for Zig

GPU Device Management

Type-Safe GPU Memory

NVRTC Kernel Compilation

PTX Loading and Execution

cuRAND Random Number Generation

Automatic CUDA Path Detection

Custom C Struct Support

Typed Error Handling

How it works

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Examples

Documentation Index

GPU Device Management

Type-Safe GPU Memory

NVRTC Kernel Compilation

PTX Loading and Execution

cuRAND Random Number Generation

Automatic CUDA Path Detection

Custom C Struct Support

Typed Error Handling

​How it works

Build docs developers (and LLMs) love

How it works