Rockchip SoCs such as the RK3588 contain three independent NPU cores (cores 0, 1, and 2). EZ RKNN Async provides two complementary strategies for spreading work across them: data parallelism viaDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt
Use this file to discover all available pages before exploring further.
schedule, where each inference task is sent to a different core in round-robin order, and tensor parallelism via tp_mode, where a single inference task is executed collaboratively by multiple cores.
The two strategies are mutually exclusive — set one or the other, never both.
Data parallelism with schedule
When schedule is set, the session creates one worker thread group per listed core and dispatches tasks using a modulo assignment:
schedule of [0, 1, 2] sends task 0 to core 0, task 1 to core 1, task 2 to core 2, task 3 to core 0 again, and so on. You can bias distribution by repeating a core: [0, 0, 1] sends two-thirds of tasks to core 0.
Create a session with schedule
Pass a list of core indices to
schedule. All three cores on RK3588 in equal rotation:threads_per_core
Each core in the schedule gets threads_per_core worker threads (default 1). Increasing this allows a single core to process multiple tasks concurrently using separate RKNN contexts. For latency-sensitive workloads keep this at 1; increase it if you want to hide I/O stalls within a core.
Tensor parallelism with tp_mode
In tensor-parallel mode, RKNN splits a single model’s computation across multiple cores simultaneously. The model must be compiled with tensor-parallel support. Set tp_mode to one of the accepted strings:
| Value | Cores used |
|---|---|
"auto" | RKNN chooses (default when neither option is set) |
"all" | All available cores |
"0" | Core 0 only |
"1" | Core 1 only |
"2" | Core 2 only |
"0,1" | Cores 0 and 1 |
"0,1,2" | All three cores on RK3588 |
Choosing between schedule and tp_mode
- Use schedule when
- Use tp_mode when
- You want higher throughput: multiple small tasks processed in parallel.
- Your model runs fast enough on one core and latency is already acceptable.
- You are processing a stream of independent frames.
Default behavior
If you set neitherschedule nor tp_mode, the session defaults to tp_mode="auto", which lets the RKNN runtime decide the core assignment. This is equivalent to: