Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt

Use this file to discover all available pages before exploring further.

After compiling CUDA C source to PTX, you load it onto a device as a Module and then retrieve individual kernel entry points as Function handles. Calling func.run(...) dispatches the kernel on the GPU with the grid and block dimensions you specify in a LaunchConfig. The Module and Function types are thin, zero-overhead wrappers around the CUDA driver API’s CUmodule and CUfunction handles.

Module

A Module represents a loaded PTX image on a CUDA device. You obtain a Module by calling Device.loadPtxText or Device.loadPtx.

Fields

FieldTypeDescription
cu_modulecuda.CUmoduleUnderlying CUDA driver module handle

getFunc

pub fn getFunc(self: Module, name: []const u8) CudaError.Error!Function
Looks up a kernel function by its C symbol name within the module and returns a Function handle. Uses cuModuleGetFunction internally.
self
Module
required
The loaded PTX module to query.
name
[]const u8
required
The C-linkage name of the kernel function (without name mangling). Must match the symbol as it appears in the compiled PTX.
Returns: CudaError.Error!Function
The kernel must be declared with extern "C" linkage in your CUDA source to prevent C++ name mangling. The typical declaration is extern "C" __global__ void my_kernel(...). The name string you pass to getFunc must match this symbol exactly.

Example

const ptx = try Cuda.Compile.cudaText(kernel_source, null, allocator);
defer allocator.free(ptx);

const module = try Cuda.Device.loadPtxText(ptx);
const func = try module.getFunc("my_kernel");

Function

A Function is a handle to a single __global__ kernel entry point within a loaded Module. You use it to launch the kernel on the GPU.

Fields

FieldTypeDescription
cu_funccuda.CUfunctionUnderlying CUDA driver function handle

run

pub fn run(self: Function, params: anytype, cfg: LaunchConfig) CudaError.Error!void
Launches the kernel using cuLaunchKernel. Grid and block dimensions are taken from cfg. The kernel arguments are passed via params, which must be a struct literal where each field is a pointer to a kernel argument value.
self
Function
required
The kernel function handle to launch.
params
anytype
required
A struct literal whose fields are pointers to the kernel arguments, in the same order as the kernel’s C parameter list. Must be a struct — arrays are rejected at compile time.
cfg
LaunchConfig
required
Grid and block dimensions and shared memory size for this launch. See LaunchConfig.
Returns: CudaError.Error!void
params must be a struct literal at the call site. Passing an array or any non-struct type causes a compile error: "Invalid params type, must be a struct". Additionally, each field must be a pointer (&value) — passing values directly will produce incorrect results or a runtime crash.

Example

const kernel_source =
    \\extern "C" __global__ void scale(float* data, float factor, int n) {
    \\    int i = blockIdx.x * blockDim.x + threadIdx.x;
    \\    if (i < n) data[i] *= factor;
    \\}
;

const device = try Cuda.Device.default();
defer device.deinit();

const ptx = try Cuda.Compile.cudaText(kernel_source, null, allocator);
defer allocator.free(ptx);

const module = try Cuda.Device.loadPtxText(ptx);
const func = try module.getFunc("scale");

const host_data = [_]f32{ 1.0, 2.0, 3.0, 4.0 };
const cu_slice = try device.htodCopy(f32, &host_data);
defer cu_slice.free();

var factor: f32 = 2.0;
var n: i32 = 4;

try func.run(
    .{ &cu_slice.device_ptr, &factor, &n },
    Cuda.LaunchConfig.for_num_elems(4),
);

var results = try Cuda.Device.syncReclaim(f32, allocator, cu_slice);
defer results.deinit();
// results.items == { 2.0, 4.0, 6.0, 8.0 }
For kernels that operate on CudaSlice buffers, pass &cu_slice.device_ptr rather than the slice itself. The CUDA driver kernel receives a CUdeviceptr (which is an integer type), so you take the address of the device_ptr field and let the driver dereference it.

Build docs developers (and LLMs) love