cudaz provides a complete pipeline for running CUDA kernels from Zig: write your kernel in CUDA C, compile it to PTX at runtime using NVRTC, load the PTX into aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt
Use this file to discover all available pages before exploring further.
Module, look up the kernel by name as a Function, and finally launch it with a LaunchConfig. Each stage is a distinct, composable step, so you can cache PTX between runs, pre-compile kernels at startup, or load PTX from disk — whichever fits your workflow.
The Module and Function Types
Module wraps a CUmodule and represents a compiled PTX image loaded onto the device. You obtain a Module by calling one of the Device.loadPtx* functions.
Function wraps a CUfunction and represents a single kernel entry point within a module. You obtain a Function by calling module.getFunc("kernel_name").
Neither type manages its own lifetime — unloading a module is left to future cudaz versions. In practice, modules are typically kept alive for the full duration of the program.
Compiling CUDA Kernels with Compile
The Compile module wraps NVRTC (NVIDIA Runtime Compilation) and turns CUDA C source into a PTX string at runtime. Both functions return [:0]const u8 — a sentinel-terminated PTX string that the caller must free with the same allocator.
Compile.cudaText(cuda_src, options, allocator)
Compiles an inline CUDA C string. Pass null for options to use NVRTC defaults:
Compile.cudaFile(file, io, options, allocator)
Compiles a .cu file from disk. The file is read into a 1 MB buffer, null-terminated, and then passed to cudaText internally. The io value is a std.Io handle obtained from your program’s startup context:
Compile Options
TheOptions struct supports fine-grained NVRTC flags:
| Field | Type | NVRTC flag |
|---|---|---|
ftz | ?bool | --ftz |
prec_sqrt | ?bool | --prec-sqrt |
prec_div | ?bool | --prec-div |
use_fast_math | ?bool | --fmad |
maxrregcount | ?usize | --maxrregcount |
include_paths | [][]const u8 | --include-path |
arch | [][]const u8 | -arch |
macro | [][]const u8 | --define-macro |
null to omit all options and let NVRTC use its defaults.
Loading PTX
Once you have a PTX string or file, load it into aModule through the Device.
Device.loadPtxText(ptx)
Loads a PTX image from an in-memory sentinel-terminated string. This is the most common path — use the PTX returned by Compile.cudaText or Compile.cudaFile:
Device.loadPtx(PathBuffer)
Loads a PTX image directly from a file path on disk, bypassing the in-memory string entirely. Useful when you have pre-compiled .ptx files bundled with your application:
Getting a Kernel Function
module.getFunc(name) looks up a kernel entry point by its C symbol name and returns a Function:
name argument must match the symbol exactly as it appears in the compiled PTX. Using extern "C" in your CUDA source (see below) ensures the symbol name is not mangled.
Running a Kernel
function.run(params, cfg) launches the kernel via cuLaunchKernel. The params argument must be a Zig struct of pointers to the kernel arguments — each field must be a pointer, not a value.
@compileError).
Full End-to-End Example
Writing CUDA Kernels
CUDA kernels intended for use with cudaz must be declared withextern "C" to prevent C++ name mangling. Without it, the symbol name in the PTX will not match the string you pass to getFunc:
if (i < n) is essential when n is not an exact multiple of the block size, which is the common case with LaunchConfig.for_num_elems.