cudaz: CUDA GPU Library for Zig

cudaz gives Zig developers a high-level, type-safe interface to NVIDIA GPU programming via CUDA. It handles device initialization, memory management, runtime kernel compilation through NVRTC, and random number generation through cuRAND — with idiomatic Zig error handling throughout.

Installation

Add cudaz to your Zig project with a single fetch command

Quickstart

Allocate GPU memory and run your first CUDA kernel in minutes

API Reference

Complete reference for every public type and function

Examples

Working examples for arrays, custom types, and matrix ops

What cudaz provides

GPU Memory Management

Type-safe CudaSlice(T) for allocating, copying, and freeing GPU buffers

Runtime Kernel Compilation

Compile .cu source or inline CUDA text to PTX at runtime using NVRTC

Kernel Execution

Load PTX modules and launch kernels with configurable grid/block dimensions

Random Number Generation

Generate uniform random f32 arrays on the GPU via cuRAND

Auto CUDA Detection

Automatically locates your CUDA installation on Linux and macOS

Typed Error Handling

CUDA, NVRTC, and cuRAND errors surface as first-class Zig error values

Quick example

Run a CUDA kernel that increments every element of an array in parallel:

quickstart.zig

const std = @import("std");
const Cuda = @import("cudaz");

const increment_kernel =
    \\extern "C" __global__ void increment(float *out)
    \\{
    \\    int i = blockIdx.x * blockDim.x + threadIdx.x;
    \\    out[i] = out[i] + 1;
    \\}
;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Initialize GPU 0
    const device = try Cuda.Device.default();
    defer device.deinit();

    // Copy array to GPU
    const data = [_]f32{ 1.2, 2.8, 0.123 };
    const cu_slice = try device.htodCopy(f32, &data);
    defer cu_slice.free();

    // Compile and run kernel
    const ptx = try Cuda.Compile.cudaText(increment_kernel, .{}, allocator);
    defer allocator.free(ptx);
    const module = try Cuda.Device.loadPtxText(ptx);
    const function = try module.getFunc("increment");
    try function.run(
        .{&cu_slice.device_ptr},
        Cuda.LaunchConfig{ .block_dim = .{ 3, 1, 1 }, .grid_dim = .{ 1, 1, 1 }, .shared_mem_bytes = 0 },
    );

    // Retrieve results
    var result = try Cuda.Device.syncReclaim(f32, allocator, cu_slice);
    defer result.deinit(allocator);
    // result.items == [2.2, 3.8, 1.123]
}

Install cudaz

Run zig fetch --save https://github.com/akhildevelops/cudaz/archive/0.4.0.tar.gz in your project directory.

Configure build.zig

Add the cudaz module and link cuda, nvrtc, and libc in your build.zig.

Initialize a device

Call Cuda.Device.default() to set up GPU 0 and get a Device handle.

Run a kernel

Compile your CUDA source, load the PTX, and call function.run() with your data and a LaunchConfig.

cudaz supports Linux and macOS. Windows is not currently supported.

Getting Started

Core Concepts

Guides

Examples

cudaz: CUDA GPU Library for Zig

Installation

Quickstart

API Reference

Examples

What cudaz provides

GPU Memory Management

Runtime Kernel Compilation

Kernel Execution

Random Number Generation

Auto CUDA Detection

Typed Error Handling

Quick example

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

Examples

Documentation Index

Installation

Quickstart

API Reference

Examples

​What cudaz provides

GPU Memory Management

Runtime Kernel Compilation

Kernel Execution

Random Number Generation

Auto CUDA Detection

Typed Error Handling

​Quick example

Build docs developers (and LLMs) love

What cudaz provides

Quick example