Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt

Use this file to discover all available pages before exploring further.

Every runtime behaviour of an InferenceSession is controlled through the provider_options argument. You can supply options as a plain dict, as a RknnProviderOptions typed dict, or by calling the make_provider_options() helper, which validates your inputs and returns a typed dict ready for the session constructor. All options are optional — unspecified keys fall back to their documented defaults.

Two ways to pass options

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

opts = make_provider_options(
    layout="nchw",
    max_queue_size=4,
    threads_per_core=2,
    schedule=[0, 1, 2],
)

sess = InferenceSession("model.rknn", provider_options=opts)
make_provider_options() raises a ValueError immediately if you supply both schedule and tp_mode. Passing a plain dict defers that check to the C++ layer when the session is constructed.
Unknown keys in provider_options are rejected at runtime with a descriptive error listing all accepted keys. This helps catch typos before a session is created.

Option reference

Input layout

layout
LayoutLike
default:"\"original\""
Controls how the session interprets the dimension order of 4-D input tensors. The default "original" (alias "nchw") matches the native RKNN tensor format. See the layouts reference for a full breakdown of each value and when to use it.Valid values: "nchw", "original", "nchw_software", "original_software", "nhwc", "any".

Queue and threading

max_queue_size
int
default:"3"
Maximum number of inference tasks that may be in-flight in the async task queue at the same time. When the queue is full, new submissions block until a slot opens (up to submit_timeout_ms milliseconds) and raise RuntimeError if no slot becomes available in time.Must be greater than 0.
threads_per_core
int
default:"1"
Number of worker threads created per unique NPU core. When combined with a multi-core schedule, the total thread count is threads_per_core × len(unique cores in schedule).Must be greater than 0.
submit_timeout_ms
int
default:"10000"
How long (in milliseconds) a call to run, run_async, or run_pipeline will wait when the task or callback queue is saturated before raising RuntimeError. Set to a large value for batch workloads where occasional stalls are acceptable.Must be greater than 0.

Callback ordering

sequential_callbacks
bool
default:"True"
When True, the callback thread fires run_async callbacks in the same order tasks were submitted, regardless of which NPU core finished first. When False, callbacks are fired as soon as each task completes, which can improve latency at the cost of out-of-order delivery.

Core scheduling

schedule
ScheduleLike
Enables data-parallel scheduling across multiple NPU cores. Tasks are assigned to cores in round-robin order: core_id = schedule[task_id % len(schedule)]. Accepts an int (single core), a comma-separated string such as "0,1,2", or a list of ints.Mutually exclusive with tp_mode. See the scheduling reference for full details.
tp_mode
TpModeLike
Enables tensor-parallel mode using an RKNN core mask. All worker contexts are assigned the specified rknn_core_mask. Defaults to RKNN_NPU_CORE_AUTO when neither schedule nor tp_mode is set.Valid values: "auto", "all", "0", "1", "2", "0,1", "0,1,2".Mutually exclusive with schedule. See the scheduling reference for full details.
Setting both schedule and tp_mode raises ValueError from make_provider_options() and RuntimeError from the session constructor. Set only one.

Throughput pacing

enable_pacing
bool
default:"False"
When True, the session measures the exponential moving average of per-task inference time (avg_proc_time_us with α = 0.95 for the old value, 0.05 for the new sample) and silently drops submissions that arrive faster than the NPU can process them. This keeps the queue from filling up under a burst load and produces smoother end-to-end throughput. Submissions dropped by the pacer return None from the internal async interface; at the Python level this is transparent because the session retries until the queue accepts the task.

Context duplication

disable_dup_context
bool
default:"False"
When True, each worker thread calls rknn_init independently instead of calling rknn_dup_context to clone the initial context. Independent init is slower to start but avoids known RKNN stability issues that can occur when custom ops are registered on a duplicated context. Loading custom ops automatically forces this behaviour regardless of this flag’s value.

Custom operator plugins

custom_op_path
PathLike | Sequence[PathLike]
Path or list of paths to .so plugin files that export get_rknn_custom_op. Alias for custom_op_paths; both keys can be supplied together and their path lists are merged.
custom_op_paths
PathLike | Sequence[PathLike]
Alias for custom_op_path. Accepts a single path or a sequence of paths.
custom_op_default_path
bool
default:"False"
When True, the runtime scans the platform default plugin directory for .so files whose names start with librkcst_ and loads each one. On Linux the directory is /usr/lib/rknpu/op_plugins/; on Android it is /vendor/lib64/ (arm64) or /vendor/lib/ (arm32). Alias load_custom_ops_from_default_path is also accepted.
Requesting custom op loading (via any of the three options above) automatically sets disable_dup_context to True for the session and emits a UserWarning explaining the reason.

Complete example

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

opts = make_provider_options(
    layout="nchw",
    max_queue_size=6,
    threads_per_core=1,
    submit_timeout_ms=5000,
    sequential_callbacks=True,
    schedule=[0, 1, 2],       # data-parallel across all three RK3588 cores
    enable_pacing=False,
    disable_dup_context=False,
)

sess = InferenceSession("model.rknn", provider_options=opts)

Build docs developers (and LLMs) love