The CPU backend is ggml’s built-in execution target. It requires no external dependencies, works on every supported platform, and is always available as a fallback when no GPU backend is present.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ggml-org/ggml/llms.txt
Use this file to discover all available pages before exploring further.
Initialization
Call
ggml_backend_load_all() before using ggml_backend_init_best() or ggml_backend_init_by_type() so that all compiled-in backends are registered.Thread configuration
The CPU backend parallelises operations across threads. You control the thread count after initialization:Custom thread pool
For finer control — including thread affinity and NUMA-awareness — create aggml_threadpool and attach it:
| Function | Description |
|---|---|
ggml_threadpool_new(params) | Create a thread pool with the given parameters |
ggml_threadpool_free(pool) | Destroy the thread pool |
ggml_threadpool_get_n_threads(pool) | Query the thread count |
ggml_threadpool_pause(pool) | Suspend worker threads |
ggml_threadpool_resume(pool) | Resume suspended threads |
NUMA support
On systems with multiple NUMA nodes, initialise ggml’s NUMA support before creating backends:| Strategy | Description |
|---|---|
GGML_NUMA_STRATEGY_DISABLED | No NUMA awareness (default) |
GGML_NUMA_STRATEGY_DISTRIBUTE | Distribute threads across nodes |
GGML_NUMA_STRATEGY_ISOLATE | Pin all threads to one node |
GGML_NUMA_STRATEGY_NUMACTL | Honour numactl binding from the shell |
GGML_NUMA_STRATEGY_MIRROR | Mirror allocation across nodes |
SIMD optimisations
ggml detects CPU features at runtime and selects the most capable implementation for each operation. You can query which extensions are available:- x86
- ARM
- Other
Abort callback
You can register a callback that the CPU backend will call periodically during graph compute. Returntrue to abort execution:
Reference implementations
For debugging or correctness testing, force the backend to use unoptimised scalar code:Build configuration
The CPU backend is compiled into ggml unconditionally. No additional CMake flags are required. SIMD paths are enabled automatically when the target compiler supports them.API summary
| Function | Description |
|---|---|
ggml_backend_cpu_init() | Create a CPU backend instance |
ggml_backend_is_cpu(backend) | Check whether a backend is the CPU backend |
ggml_backend_cpu_set_n_threads(backend, n) | Set the thread count |
ggml_backend_cpu_set_threadpool(backend, pool) | Attach a custom thread pool |
ggml_backend_cpu_set_abort_callback(backend, cb, data) | Register an abort callback |
ggml_backend_cpu_set_use_ref(backend, use_ref) | Force reference (scalar) implementations |
ggml_backend_cpu_reg() | Return the CPU backend registry entry |
