Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt

Use this file to discover all available pages before exploring further.

The Compile module wraps NVIDIA’s NVRTC (NVIDIA Runtime Compilation) library to let you compile CUDA C source code into PTX at runtime, entirely from Zig. You can supply kernel source as an inline string or point to a .cu file on disk. The resulting PTX string can be loaded directly onto a device with Device.loadPtxText. Every compilation step is controlled through the Options struct, which maps to NVRTC command-line flags.

Import

const CuCompile = @import("cudaz").Compile;

Functions

cudaText

pub fn cudaText(
    cuda_text: []const u8,
    options: ?Options,
    allocator: std.mem.Allocator,
) CompileError![:0]const u8
Compiles an inline CUDA C source string to PTX. Internally creates an nvrtcProgram, compiles it with the provided options, and extracts the PTX output. The returned PTX is a sentinel-terminated ([:0]) string allocated with allocator. The caller must free it.
cuda_text
[]const u8
required
The full CUDA C source code to compile. Does not need to be null-terminated.
options
?Options
Compilation options. Pass null to use all NVRTC defaults.
allocator
std.mem.Allocator
required
Allocator used for the returned PTX string and internal temporaries.
Returns: CompileError![:0]const u8 — null-terminated PTX string owned by the caller
const kernel_src =
    \\extern "C" __global__ void add(float* a, float* b, float* c, int n) {
    \\    int i = blockIdx.x * blockDim.x + threadIdx.x;
    \\    if (i < n) c[i] = a[i] + b[i];
    \\}
;

const ptx = try CuCompile.cudaText(kernel_src, null, allocator);
defer allocator.free(ptx);

const module = try Cuda.Device.loadPtxText(ptx);
const func = try module.getFunc("add");
When compilation fails (e.g., a syntax error in the kernel), cudaz prints the NVRTC compiler log to stderr before returning error.NVRTC_ERROR_COMPILATION. Check the printed log for line numbers and error messages.

cudaFile

pub fn cudaFile(
    cuda_path: std.Io.File,
    io: std.Io,
    options: ?Options,
    allocator: std.mem.Allocator,
) ![:0]const u8
Reads a .cu file from disk and compiles it to PTX. Allocates up to 1 MiB to buffer the file contents, then delegates to cudaText. The returned PTX string is allocated with allocator and must be freed by the caller.
cuda_path
std.Io.File
required
An open file handle pointing to the CUDA source file to compile.
io
std.Io
required
The I/O interface used to read the file.
options
?Options
Compilation options. Pass null to use NVRTC defaults.
allocator
std.mem.Allocator
required
Allocator for the returned PTX string and internal buffers.
Returns: ![:0]const u8 — null-terminated PTX string owned by the caller
The internal read buffer is fixed at 1 MiB. CUDA source files larger than 1 MiB will be truncated. For large kernel files, prefer reading the file manually and passing the contents to cudaText.

cudaProgram

pub fn cudaProgram(
    prg: nvrtc.nvrtcProgram,
    options: ?Options,
    allocator: std.mem.Allocator,
) CompileError!void
Compiles an already-created nvrtcProgram. This lower-level function is useful when you need to manage the NVRTC program lifecycle yourself (for example, to attach headers or set program names). cudaText calls this internally.
prg
nvrtc.nvrtcProgram
required
An nvrtcProgram handle created via nvrtcCreateProgram.
options
?Options
Compilation options. Pass null for defaults.
allocator
std.mem.Allocator
required
Allocator for building the options array and fetching the compiler log on error.
Returns: CompileError!void

getPtx

pub fn getPtx(
    prg: nvrtc.nvrtcProgram,
    allocator: std.mem.Allocator,
) CompileError![:0]const u8
Extracts the compiled PTX text from an nvrtcProgram that has already been compiled with cudaProgram. Queries the PTX size via nvrtcGetPTXSize, allocates a buffer, and fills it with nvrtcGetPTX. The returned string is owned by the caller.
prg
nvrtc.nvrtcProgram
required
A successfully compiled nvrtcProgram.
allocator
std.mem.Allocator
required
Allocator for the returned PTX buffer.
Returns: CompileError![:0]const u8

Options

The Options struct maps Zig fields to NVRTC compiler flags. All fields are optional; unset fields are omitted from the compiler invocation.
const options = CuCompile.Options{
    .use_fast_math = true,
    .arch = &[_][]const u8{"compute_86"},
};
ftz
?bool
Flush denormal floating-point values to zero. Maps to --ftz=true / --ftz=false.
prec_sqrt
?bool
Use IEEE-compliant square root. Maps to --prec-sqrt=true / --prec-sqrt=false. Disable for faster but less precise sqrtf.
prec_div
?bool
Use IEEE-compliant division. Maps to --prec-div=true / --prec-div=false. Disable for faster reciprocal approximation.
use_fast_math
?bool
Enable fused multiply-add (FMA) instructions. Maps to --fmad=true / --fmad=false. Improves throughput at the cost of strict IEEE rounding.
maxrregcount
?usize
Maximum number of registers each compiled thread may use. Maps to --maxrregcount=N. Lower values increase occupancy but may spill to local memory.
include_paths
[][]const u8
Additional header search directories. Each entry maps to --include-path=<dir>. Defaults to an empty slice.
arch
[][]const u8
Target GPU compute architectures. Each entry maps to -arch=<arch> (e.g., "compute_86" for Ampere). Defaults to an empty slice (NVRTC selects a default).
macro
[][]const u8
Preprocessor macro definitions. Each entry maps to --define-macro=<name>. Defaults to an empty slice.

CompileError

const CompileError = std.mem.Allocator.Error || Error.NvrtcError.Error || error{StreamTooLong};
The error union returned by compilation functions combines three sources:
Error setWhen it occurs
std.mem.Allocator.ErrorAllocation failure while building option arrays or the PTX buffer
NvrtcError.ErrorAny NVRTC API call failure, including NVRTC_ERROR_COMPILATION
error{StreamTooLong}File read overflow when using cudaFile
The most common compile-time error is error.NVRTC_ERROR_COMPILATION. When this occurs, cudaz automatically fetches and prints the NVRTC program log to stderr. Read that output to diagnose syntax errors or missing includes in your CUDA kernel source.

Build docs developers (and LLMs) love