cudaz provides two GPU buffer types:Documentation Index
Fetch the complete documentation index at: https://mintlify.com/akhildevelops/cudaz/llms.txt
Use this file to discover all available pages before exploring further.
CudaSlice(T) and CudaSliceR. CudaSlice(T) is a generic, comptime-typed slice — T is fixed at compile time, giving you type-safe access to GPU memory. CudaSliceR is the runtime-typed counterpart backed by the DType enum, useful when the element type is only known at runtime (for example, when dispatching over f16 or f32 based on user input). Both types hold a device_ptr (a CUdeviceptr), a len, and a reference back to the originating Device.
CudaSlice(T)
Fields
| Field | Type | Description |
|---|---|---|
device_ptr | cuda.CUdeviceptr | The raw GPU device pointer |
len | usize | Number of elements of type T |
device | Device | The device that owns this allocation |
Allocation
device.alloc(T, length) allocates uninitialized GPU memory for length elements of type T. The memory contents are undefined until written.
device.allocZeros(T, length) allocates GPU memory for length elements and zeroes every byte using cuMemsetD8_v2. Use this when you need a known-zero starting state.
Freeing Memory
cu_slice.free() calls cuMemFree_v2 on the underlying device pointer and panics on failure. Always release allocations when they are no longer needed. The defer pattern is the safest way to do this:
Cloning
cu_slice.clone() performs a device-to-device copy and returns a new, independently owned CudaSlice(T) of the same length. The original slice is not modified. Both the original and the clone must be freed separately:
Host-to-Device Transfers
device.htodCopy(T, src_slice)
Allocates a new GPU buffer of the same length as src_slice and copies the host data into it in one call. This is the most convenient way to move a host slice to the GPU:
Device.htodCopyInto(T, src, dst)
Copies from a host slice src into an already-allocated CudaSlice(T) dst. The lengths of src and dst must be equal — an assertion failure is triggered at runtime if they differ. Use this when you want to reuse an existing GPU buffer:
Device-to-Host Transfers
Device.dtohCopy(T, allocator, slice)
Copies GPU memory into a freshly allocated host slice []T. The caller owns the returned slice and must free it with the same allocator:
Device.syncReclaim(T, allocator, slice)
Copies GPU memory into a std.ArrayList(T). The caller owns the returned list. This is convenient when you need the result as a resizable list rather than a plain slice:
CudaSliceR — Runtime-Typed Buffers
CudaSliceR is used when the element type is determined at runtime. It carries an element_type: DType field alongside device_ptr, len, and device.
The DType enum currently supports two variants:
| Variant | Zig type | Size |
|---|---|---|
.f16 | f16 | 2 bytes |
.f32 | f32 | 4 bytes |
CudaSliceR back to the host, use Device.syncReclaimR, providing the concrete Zig type as a comptime parameter:
CudaSliceR also supports .clone(), which performs a device-to-device copy and returns a new CudaSliceR with the same element_type.
Memory Ownership
cudaz does not use any automatic reference counting. EveryCudaSlice(T) or CudaSliceR that you allocate must be explicitly freed. The idiomatic pattern is to call .free() via defer immediately after allocation:
a and b are independently owned and must each be freed. Forgetting to free either one leaks GPU memory for the lifetime of the process.