Documentation Index
Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt
Use this file to discover all available pages before exploring further.
Device is the central type in cudaz. It represents a single NVIDIA GPU and its primary CUDA context. You use a Device to allocate GPU memory, copy data between host and device, load compiled PTX modules, and free resources. Most cudaz workflows begin by calling Device.default() or Device.new(gpu) and end with device.deinit().
Import
Fields
| Field | Type | Description |
|---|---|---|
device | cuda.CUdevice | Underlying CUDA device handle from the driver API |
primary_context | cuda.CUcontext | The primary context retained for this device |
ordinal | u16 | Zero-based GPU index used to select this device |
Initialization
default
Device for GPU 0. Calls cuInit, retrieves device 0, retains its primary context, and sets it as the current context. Equivalent to Device.new(0).
Returns: CudaError.Error!Device
new
Device for the GPU at the specified ordinal index.
Zero-based index of the GPU to initialize. Pass
0 for the first GPU, 1 for the second, and so on.CudaError.Error!Device
deinit
cuDevicePrimaryCtxRelease. Panics if the release fails. Call this with defer immediately after a successful default() or new().
Pointer to the Device to release.
Memory Allocation
alloc
length elements of type T. The returned CudaSlice(T) holds the device pointer and length. The caller is responsible for freeing the slice with slice.free().
The device on which to allocate memory.
Comptime element type (e.g.,
f32, i32, u8).Number of elements to allocate. Total bytes allocated is
@sizeOf(T) * length.CudaError.Error!CudaSlice(T)
allocZeros
length elements of type data_type and immediately zero-initializes every byte using cuMemsetD8_v2. Equivalent to alloc followed by memsetZeros.
The device on which to allocate memory.
Comptime element type.
Number of elements to allocate and zero.
CudaError.Error!CudaSlice(data_type)
allocR
DType value instead of a comptime type. Returns a CudaSliceR whose element_type field carries the DType for later dispatch.
The device on which to allocate memory.
Runtime element type —
.f16 (2 bytes) or .f32 (4 bytes).Number of elements to allocate.
CudaError.Error!CudaSliceR
allocZerosR
allocZeros. Allocates and zero-initializes GPU memory with element type determined at runtime.
The device on which to allocate memory.
Runtime element type.
Number of elements to allocate and zero.
CudaError.Error!CudaSliceR
Host ↔ Device Transfers
htodCopy
CudaSlice(T) on the device and copies all elements from the host slice src into it in one operation. The returned slice has the same length as src.
The device to allocate on and copy to.
Comptime element type.
Host slice to copy from. The entire slice is transferred.
CudaError.Error!CudaSlice(T)
htodCopyInto
src into an already-allocated CudaSlice(T). Asserts that src.len == destination.len. Use this when you want to reuse an existing GPU allocation rather than allocating a new one.
Comptime element type.
Host slice to copy from. Must have the same length as
destination.Pre-allocated GPU slice to copy into.
CudaError.Error!void
dtohCopy
allocator and copies the contents of slice into it. The returned []T is owned by the caller and must be freed with allocator.free(...).
Comptime element type.
Allocator used to create the host buffer.
GPU slice to copy from.
![]T — heap-allocated host slice
syncReclaim
ArrayList(T). The returned ArrayList owns its memory; call arr.deinit() when done. This is a convenience wrapper around dtohCopy for callers that prefer ArrayList.
Comptime element type.
Allocator for the
ArrayList.GPU slice to copy from.
!std.ArrayList(T)
syncReclaimR
syncReclaim. Copies a CudaSliceR to an ArrayList(T). You must supply the concrete T at the call site; it must match the runtime DType stored in slice.element_type.
Comptime element type — must match
slice.element_type.Allocator for the
ArrayList.Runtime-typed GPU slice to copy from.
!std.ArrayList(T)
PTX Loading
loadPtx
CUmodule using cuModuleLoad. The PathBuffer wraps a null-terminated path string.
Null-terminated path to the
.ptx file to load.CudaError.Error!Module
loadPtxText
cuModuleLoadData. Use this after compiling with Cuda.Compile.cudaText or cudaFile.
Null-terminated PTX string, as returned by the
Compile functions.CudaError.Error!Module
Memory Utilities
memsetZeros
cuMemsetD8_v2. Useful for resetting a previously allocated buffer.
Comptime element type of the slice.
The GPU buffer to zero-fill.
CudaError.Error!void
memsetZerosR
memsetZeros. Sets every byte of a CudaSliceR to zero using cuMemsetD8_v2. The byte count is computed using dtype.size() * cuda_slice.len.
Runtime element type —
.f16 (2 bytes) or .f32 (4 bytes).The runtime-typed GPU buffer to zero-fill.
CudaError.Error!void
free
cuMemFree_v2. Prefer calling slice.free() on a CudaSlice or CudaSliceR instead, which handles the pointer internally.
The raw device pointer to free.
!void