Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt

Use this file to discover all available pages before exploring further.

Rockchip SoCs such as the RK3588 contain three independent NPU cores (cores 0, 1, and 2). EZ RKNN Async provides two complementary strategies for spreading work across them: data parallelism via schedule, where each inference task is sent to a different core in round-robin order, and tensor parallelism via tp_mode, where a single inference task is executed collaboratively by multiple cores. The two strategies are mutually exclusive — set one or the other, never both.

Data parallelism with schedule

When schedule is set, the session creates one worker thread group per listed core and dispatches tasks using a modulo assignment:
coreId = schedule[taskId % len(schedule)]
This means a schedule of [0, 1, 2] sends task 0 to core 0, task 1 to core 1, task 2 to core 2, task 3 to core 0 again, and so on. You can bias distribution by repeating a core: [0, 0, 1] sends two-thirds of tasks to core 0.
1

Install and import

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options
import numpy as np
2

Create a session with schedule

Pass a list of core indices to schedule. All three cores on RK3588 in equal rotation:
opts = make_provider_options(
    schedule=[0, 1, 2],   # round-robin across all three cores
    threads_per_core=1,   # one worker thread per core
)

sess = InferenceSession("model.rknn", provider_options=opts)
3

Run inference as usual

Nothing changes in how you call run, run_async, or run_pipeline. The dispatch is transparent.
input_data = np.random.rand(1, 3, 224, 224).astype(np.float32)
outputs = sess.run(None, {"input": input_data})

threads_per_core

Each core in the schedule gets threads_per_core worker threads (default 1). Increasing this allows a single core to process multiple tasks concurrently using separate RKNN contexts. For latency-sensitive workloads keep this at 1; increase it if you want to hide I/O stalls within a core.
opts = make_provider_options(
    schedule=[0, 1],
    threads_per_core=2,  # two concurrent inference threads per core
)

Tensor parallelism with tp_mode

In tensor-parallel mode, RKNN splits a single model’s computation across multiple cores simultaneously. The model must be compiled with tensor-parallel support. Set tp_mode to one of the accepted strings:
ValueCores used
"auto"RKNN chooses (default when neither option is set)
"all"All available cores
"0"Core 0 only
"1"Core 1 only
"2"Core 2 only
"0,1"Cores 0 and 1
"0,1,2"All three cores on RK3588
opts = make_provider_options(
    tp_mode="0,1,2",  # tensor-parallel across all three cores
)

sess = InferenceSession("model.rknn", provider_options=opts)
Tensor-parallel mode can lower latency for large models that are memory-bandwidth-bound, but it has no benefit for small models. Benchmark both modes for your specific model and SoC.

Choosing between schedule and tp_mode

  • You want higher throughput: multiple small tasks processed in parallel.
  • Your model runs fast enough on one core and latency is already acceptable.
  • You are processing a stream of independent frames.
opts = make_provider_options(schedule=[0, 1, 2])
Setting both schedule and tp_mode in the same call to make_provider_options raises a ValueError at session creation time. Pick one strategy per session.

Default behavior

If you set neither schedule nor tp_mode, the session defaults to tp_mode="auto", which lets the RKNN runtime decide the core assignment. This is equivalent to:
opts = make_provider_options()  # tp_mode defaults to "auto"

Build docs developers (and LLMs) love