Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt

Use this file to discover all available pages before exploring further.

RknnProviderOptions is a TypedDict that defines all accepted keys for the provider_options argument of InferenceSession. make_provider_options() is a keyword-only helper function that constructs and returns a validated RknnProviderOptions dict. Both are exported from ztu_somemodelruntime_ez_rknn_async. Using the helper is recommended over constructing the dict manually: it validates that schedule and tp_mode are not both set and gives you editor autocompletion and type checking via the TypedDict return type.
from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

opts = make_provider_options(
    layout="nchw",
    max_queue_size=4,
    schedule=[0, 1, 2],
)
session = InferenceSession("model.rknn", provider_options=opts)

make_provider_options()

make_provider_options(
    *,
    layout: LayoutLike = "original",
    max_queue_size: int = 3,
    threads_per_core: int = 1,
    submit_timeout_ms: int = 10000,
    sequential_callbacks: bool = True,
    schedule: ScheduleLike | None = None,
    tp_mode: TpModeLike | None = None,
    enable_pacing: bool = False,
    disable_dup_context: bool = False,
    custom_op_paths: PathLike | Sequence[PathLike] | None = None,
    custom_op_default_path: bool = False,
) -> RknnProviderOptions
All parameters are keyword-only. Omit any parameter to accept its default.
Passing both schedule and tp_mode raises ValueError immediately. Set only one of the two scheduling options.

Input layout

layout
LayoutLike
default:"\"original\""
Controls how the session interprets the dimension order of 4-D input tensors.Valid values:
ValueBehaviour
"nchw"Native RKNN format (alias for "original")
"original"Native RKNN format (default)
"nchw_software"Accept NCHW input and transpose to NHWC in software before submitting to the NPU (alias for "original_software")
"original_software"Same as "nchw_software"
"nhwc"Pass NHWC input directly to the NPU
"any"Bypass layout validation; use when the model does not have a 4-D spatial layout

Queue and threading

max_queue_size
int
default:"3"
Maximum number of inference tasks that may be in-flight in the async queue simultaneously. When the queue is full, new submissions block for up to submit_timeout_ms milliseconds before raising RuntimeError.Must be greater than 0.
threads_per_core
int
default:"1"
Number of worker threads created per unique NPU core. When combined with a multi-core schedule, the total number of worker threads is threads_per_core × len(unique cores in schedule).Must be greater than 0.
submit_timeout_ms
int
default:"10000"
Maximum time in milliseconds that run, run_async, or run_pipeline will wait when the task or callback queue is saturated before raising RuntimeError.Must be greater than 0.

Callback ordering

sequential_callbacks
bool
default:"True"
When True, run_async callbacks are emitted in the order tasks were submitted, regardless of which NPU core finished first. When False, callbacks are fired as soon as each task completes, which can reduce latency at the cost of out-of-order delivery.

Core scheduling

schedule
ScheduleLike
Enables data-parallel scheduling across multiple NPU cores. Tasks are assigned to cores in round-robin order. Accepts:
  • An int — a single core index (e.g. 0).
  • A comma-separated string — e.g. "0,1,2".
  • A sequence of ints — e.g. [0, 1, 2].
All core indices must be non-negative. An empty schedule is rejected.Mutually exclusive with tp_mode.
tp_mode
TpModeLike
Enables tensor-parallel mode using an RKNN core mask. All worker contexts are assigned the specified core mask, and the NPU splits the computation across the selected cores for each individual inference. Defaults to RKNN_NPU_CORE_AUTO when neither schedule nor tp_mode is specified.Valid values: "auto", "all", "0", "1", "2", "0,1", "0,1,2".Mutually exclusive with schedule.

Throughput pacing

enable_pacing
bool
default:"False"
When True, the session measures an exponential moving average of per-task inference time and silently drops submissions that arrive faster than the NPU can sustain. This prevents the async queue from filling under burst load and produces smoother end-to-end throughput. The session retries dropped submissions transparently, so the behaviour is invisible at the Python level.

Context duplication

disable_dup_context
bool
default:"False"
When True, each worker thread initialises its own RKNN context via rknn_init instead of cloning the primary context with rknn_dup_context. Independent initialisation is slower to start but avoids known RKNN stability issues that can occur when custom ops are registered on a duplicated context. Loading any custom op automatically forces this behaviour regardless of this flag’s value.

Custom operator plugins

custom_op_paths
PathLike | Sequence[PathLike]
One path or a sequence of paths to .so plugin files that export get_rknn_custom_op. In the RknnProviderOptions dict, both "custom_op_paths" and "custom_op_path" are accepted as aliases and their path lists are merged.
custom_op_default_path
bool
default:"False"
When True, the runtime scans the platform default plugin directory and loads any .so file whose name starts with librkcst_. In the RknnProviderOptions dict the alias "load_custom_ops_from_default_path" is also accepted.
Requesting custom op loading via either custom_op_paths or custom_op_default_path automatically forces disable_dup_context to True for the session and emits a UserWarning explaining the reason.

RknnProviderOptions TypedDict

RknnProviderOptions is a total=False TypedDict, meaning all keys are optional. It declares the same options as make_provider_options plus the aliases accepted by the runtime:
KeyAliasType
layoutLayoutLike
max_queue_sizeint
threads_per_coreint
submit_timeout_msint
sequential_callbacksbool
scheduleScheduleLike
tp_modeTpModeLike
enable_pacingbool
disable_dup_contextbool
custom_op_pathscustom_op_pathPathLike | Sequence[PathLike]
custom_op_default_pathload_custom_ops_from_default_pathbool
Unknown keys in provider_options are rejected at session construction time with a RuntimeError that lists all accepted keys. This catches typos before a session is created.

Type aliases

LayoutLike = Literal[
    "nchw", "original", "nchw_software", "original_software", "nhwc", "any"
]

TpModeLike = Literal["auto", "all", "0", "1", "2", "0,1", "0,1,2"]

ScheduleLike = Union[int, str, Sequence[int]]

PathLikeStr = Union[str, PathLike]

Examples

Data-parallel multi-core inference

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

opts = make_provider_options(
    layout="nchw",
    max_queue_size=6,
    threads_per_core=1,
    schedule=[0, 1, 2],   # distribute across all three RK3588 NPU cores
)
session = InferenceSession("model.rknn", provider_options=opts)

Tensor-parallel inference

opts = make_provider_options(
    tp_mode="0,1,2",   # fuse all three cores for a single large model
)
session = InferenceSession("model.rknn", provider_options=opts)

Custom operator plugin

opts = make_provider_options(
    custom_op_paths="/usr/lib/my_custom_op.so",
    layout="nhwc",
)
session = InferenceSession("model_with_custom_op.rknn", provider_options=opts)

Passing options as a plain dict

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, RknnProviderOptions

opts: RknnProviderOptions = {
    "layout": "nchw",
    "max_queue_size": 4,
    "schedule": [0, 1],
}
session = InferenceSession("model.rknn", provider_options=opts)

Build docs developers (and LLMs) love