TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt
Use this file to discover all available pages before exploring further.
Compile module wraps NVIDIA’s NVRTC (NVIDIA Runtime Compilation) library to let you compile CUDA C source code into PTX at runtime, entirely from Zig. You can supply kernel source as an inline string or point to a .cu file on disk. The resulting PTX string can be loaded directly onto a device with Device.loadPtxText. Every compilation step is controlled through the Options struct, which maps to NVRTC command-line flags.
Import
Functions
cudaText
nvrtcProgram, compiles it with the provided options, and extracts the PTX output. The returned PTX is a sentinel-terminated ([:0]) string allocated with allocator. The caller must free it.
The full CUDA C source code to compile. Does not need to be null-terminated.
Compilation options. Pass
null to use all NVRTC defaults.Allocator used for the returned PTX string and internal temporaries.
CompileError![:0]const u8 — null-terminated PTX string owned by the caller
When compilation fails (e.g., a syntax error in the kernel), cudaz prints the NVRTC compiler log to stderr before returning
error.NVRTC_ERROR_COMPILATION. Check the printed log for line numbers and error messages.cudaFile
.cu file from disk and compiles it to PTX. Allocates up to 1 MiB to buffer the file contents, then delegates to cudaText. The returned PTX string is allocated with allocator and must be freed by the caller.
An open file handle pointing to the CUDA source file to compile.
The I/O interface used to read the file.
Compilation options. Pass
null to use NVRTC defaults.Allocator for the returned PTX string and internal buffers.
![:0]const u8 — null-terminated PTX string owned by the caller
cudaProgram
nvrtcProgram. This lower-level function is useful when you need to manage the NVRTC program lifecycle yourself (for example, to attach headers or set program names). cudaText calls this internally.
An
nvrtcProgram handle created via nvrtcCreateProgram.Compilation options. Pass
null for defaults.Allocator for building the options array and fetching the compiler log on error.
CompileError!void
getPtx
nvrtcProgram that has already been compiled with cudaProgram. Queries the PTX size via nvrtcGetPTXSize, allocates a buffer, and fills it with nvrtcGetPTX. The returned string is owned by the caller.
A successfully compiled
nvrtcProgram.Allocator for the returned PTX buffer.
CompileError![:0]const u8
Options
The Options struct maps Zig fields to NVRTC compiler flags. All fields are optional; unset fields are omitted from the compiler invocation.
Flush denormal floating-point values to zero. Maps to
--ftz=true / --ftz=false.Use IEEE-compliant square root. Maps to
--prec-sqrt=true / --prec-sqrt=false. Disable for faster but less precise sqrtf.Use IEEE-compliant division. Maps to
--prec-div=true / --prec-div=false. Disable for faster reciprocal approximation.Enable fused multiply-add (FMA) instructions. Maps to
--fmad=true / --fmad=false. Improves throughput at the cost of strict IEEE rounding.Maximum number of registers each compiled thread may use. Maps to
--maxrregcount=N. Lower values increase occupancy but may spill to local memory.Additional header search directories. Each entry maps to
--include-path=<dir>. Defaults to an empty slice.Target GPU compute architectures. Each entry maps to
-arch=<arch> (e.g., "compute_86" for Ampere). Defaults to an empty slice (NVRTC selects a default).Preprocessor macro definitions. Each entry maps to
--define-macro=<name>. Defaults to an empty slice.CompileError
| Error set | When it occurs |
|---|---|
std.mem.Allocator.Error | Allocation failure while building option arrays or the PTX buffer |
NvrtcError.Error | Any NVRTC API call failure, including NVRTC_ERROR_COMPILATION |
error{StreamTooLong} | File read overflow when using cudaFile |