Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MilesONerd/neurenix/llms.txt

Use this file to discover all available pages before exploring further.

Framework Architecture

Neurenix is built on a flexible, modular architecture that enables seamless switching between different hardware backends at runtime. The framework consists of three core systems:

Hot-Swappable Backends

Switch between CPU, GPU, TPU, and other accelerators without code changes

Genesis System

Intelligent hardware detection and automatic device selection

Device Manager

Centralized device orchestration and memory management

Core Components

Device Manager

The DeviceManager is a singleton that orchestrates all device operations and provides hot-swappable backend functionality.
from neurenix.device_manager import DeviceManager
from neurenix.device import Device, DeviceType

# Get the device manager instance
manager = DeviceManager()

# Check available devices
available = manager.get_available_devices()
print(f"Available devices: {available}")

# Set active device
manager.active_device = Device(DeviceType.CUDA, 0)

# Get memory statistics
memory_stats = manager.get_memory_stats()
print(memory_stats)
The DeviceManager uses the singleton pattern, ensuring a single instance manages all device operations across your application.

Genesis: Intelligent Device Selection

Genesis automatically detects available hardware and selects the optimal device for your workload.
from neurenix.device_manager import Genesis
from neurenix.tensor import Tensor

# Initialize Genesis
genesis = Genesis()

# Automatic device selection based on workload
training_device = genesis.select_device(
    workload_type="training",
    tensor_shape=(1024, 1024)
)

inference_device = genesis.select_device(
    workload_type="inference",
    tensor_shape=(1, 512)
)

print(f"Training on: {training_device}")
print(f"Inference on: {inference_device}")
Genesis prioritizes TPUs for inference workloads and CUDA/ROCm devices for training, automatically falling back to CPU when specialized hardware is unavailable.

Workload-Specific Selection

# Genesis optimizes for training workloads
device = genesis.select_device(workload_type="training")
# Preference: CUDA > ROCm > TPU > CPU

model = Sequential(
    Linear(784, 512),
    ReLU(),
    Linear(512, 10)
)
model.to(device)

Supported Backends

Neurenix supports an extensive range of hardware backends:
BackendDeviceTypeUse Case
CPUDeviceType.CPUUniversal fallback, debugging
NVIDIA CUDADeviceType.CUDAGPU training and inference
AMD ROCmDeviceType.ROCMAMD GPU acceleration
Google TPUDeviceType.TPULarge-scale ML workloads
WebGPUDeviceType.WEBGPUBrowser-based inference
VulkanDeviceType.VULKANCross-platform GPU compute
OpenCLDeviceType.OPENCLHeterogeneous computing
Intel oneAPIDeviceType.ONEAPIIntel hardware acceleration
DirectMLDeviceType.DIRECTMLWindows ML acceleration
TensorRTDeviceType.TENSORRTNVIDIA optimized inference
ARMDeviceType.ARMMobile and edge devices

Device Benchmarking

Genesis can benchmark your hardware to optimize device selection:
from neurenix.device_manager import Genesis

genesis = Genesis()

# Run benchmarks on all available devices
benchmark_results = genesis.benchmark_devices()

for device, score in benchmark_results.items():
    print(f"{device}: {score:.2f} GFLOPS")

Memory Management

The DeviceManager tracks memory usage across all devices:
# Get memory stats for a specific device
device = Device(DeviceType.CUDA, 0)
stats = manager.get_memory_stats(device)

print(f"Total: {stats['total'] / 1e9:.2f} GB")
print(f"Used: {stats['used'] / 1e9:.2f} GB")
print(f"Available: {stats['available'] / 1e9:.2f} GB")

Device Synchronization

Synchronize GPU operations to ensure computation completes:
from neurenix.device_manager import DeviceManager
from neurenix.device import Device, DeviceType

manager = DeviceManager()
device = Device(DeviceType.CUDA, 0)

# Perform async operations
tensor_a = Tensor.randn((1000, 1000), device=device)
tensor_b = Tensor.randn((1000, 1000), device=device)
result = tensor_a.matmul(tensor_b)

# Synchronize to ensure completion
manager.synchronize(device)

print("Computation complete")

Architecture Benefits

Portability

Write once, run on any hardware backend without modification

Performance

Automatic selection of optimal hardware for each workload

Flexibility

Hot-swap between devices at runtime for testing and optimization

Simplicity

High-level API abstracts hardware complexity

Best Practices

Recommendation: Let Genesis handle device selection for production workloads. Manual device selection is best reserved for debugging and specific optimization scenarios.
  1. Use Genesis for automatic selection - It considers memory, performance, and workload type
  2. Synchronize before timing - GPU operations are asynchronous
  3. Monitor memory usage - Especially important for large models on GPU
  4. Benchmark your hardware - Run genesis.benchmark_devices() once to optimize future selections

Build docs developers (and LLMs) love