Provider options reference for EZ RKNN Async

Every runtime behaviour of an InferenceSession is controlled through the provider_options argument. You can supply options as a plain dict, as a RknnProviderOptions typed dict, or by calling the make_provider_options() helper, which validates your inputs and returns a typed dict ready for the session constructor. All options are optional — unspecified keys fall back to their documented defaults.

Two ways to pass options

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

opts = make_provider_options(
    layout="nchw",
    max_queue_size=4,
    threads_per_core=2,
    schedule=[0, 1, 2],
)

sess = InferenceSession("model.rknn", provider_options=opts)

make_provider_options() raises a ValueError immediately if you supply both schedule and tp_mode. Passing a plain dict defers that check to the C++ layer when the session is constructed.

Unknown keys in provider_options are rejected at runtime with a descriptive error listing all accepted keys. This helps catch typos before a session is created.

Option reference

Input layout

layout

LayoutLike

default:"\"original\""

Controls how the session interprets the dimension order of 4-D input tensors. The default "original" (alias "nchw") matches the native RKNN tensor format. See the layouts reference for a full breakdown of each value and when to use it.Valid values: "nchw", "original", "nchw_software", "original_software", "nhwc", "any".

Queue and threading

max_queue_size

int

default:"3"

Maximum number of inference tasks that may be in-flight in the async task queue at the same time. When the queue is full, new submissions block until a slot opens (up to submit_timeout_ms milliseconds) and raise RuntimeError if no slot becomes available in time.Must be greater than 0.

threads_per_core

int

default:"1"

Number of worker threads created per unique NPU core. When combined with a multi-core schedule, the total thread count is threads_per_core × len(unique cores in schedule).Must be greater than 0.

submit_timeout_ms

int

default:"10000"

How long (in milliseconds) a call to run, run_async, or run_pipeline will wait when the task or callback queue is saturated before raising RuntimeError. Set to a large value for batch workloads where occasional stalls are acceptable.Must be greater than 0.

Callback ordering

sequential_callbacks

bool

default:"True"

When True, the callback thread fires run_async callbacks in the same order tasks were submitted, regardless of which NPU core finished first. When False, callbacks are fired as soon as each task completes, which can improve latency at the cost of out-of-order delivery.

Core scheduling

schedule

ScheduleLike

Enables data-parallel scheduling across multiple NPU cores. Tasks are assigned to cores in round-robin order: core_id = schedule[task_id % len(schedule)]. Accepts an int (single core), a comma-separated string such as "0,1,2", or a list of ints.Mutually exclusive with tp_mode. See the scheduling reference for full details.

tp_mode

TpModeLike

Enables tensor-parallel mode using an RKNN core mask. All worker contexts are assigned the specified rknn_core_mask. Defaults to RKNN_NPU_CORE_AUTO when neither schedule nor tp_mode is set.Valid values: "auto", "all", "0", "1", "2", "0,1", "0,1,2".Mutually exclusive with schedule. See the scheduling reference for full details.

Setting both schedule and tp_mode raises ValueError from make_provider_options() and RuntimeError from the session constructor. Set only one.

Throughput pacing

enable_pacing

bool

default:"False"

When True, the session measures the exponential moving average of per-task inference time (avg_proc_time_us with α = 0.95 for the old value, 0.05 for the new sample) and silently drops submissions that arrive faster than the NPU can process them. This keeps the queue from filling up under a burst load and produces smoother end-to-end throughput. Submissions dropped by the pacer return None from the internal async interface; at the Python level this is transparent because the session retries until the queue accepts the task.

Context duplication

disable_dup_context

bool

default:"False"

When True, each worker thread calls rknn_init independently instead of calling rknn_dup_context to clone the initial context. Independent init is slower to start but avoids known RKNN stability issues that can occur when custom ops are registered on a duplicated context. Loading custom ops automatically forces this behaviour regardless of this flag’s value.

Custom operator plugins

custom_op_path

PathLike | Sequence[PathLike]

Path or list of paths to .so plugin files that export get_rknn_custom_op. Alias for custom_op_paths; both keys can be supplied together and their path lists are merged.

custom_op_paths

PathLike | Sequence[PathLike]

Alias for custom_op_path. Accepts a single path or a sequence of paths.

custom_op_default_path

bool

default:"False"

When True, the runtime scans the platform default plugin directory for .so files whose names start with librkcst_ and loads each one. On Linux the directory is /usr/lib/rknpu/op_plugins/; on Android it is /vendor/lib64/ (arm64) or /vendor/lib/ (arm32). Alias load_custom_ops_from_default_path is also accepted.

Requesting custom op loading (via any of the three options above) automatically sets disable_dup_context to True for the session and emits a UserWarning explaining the reason.

Complete example

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

opts = make_provider_options(
    layout="nchw",
    max_queue_size=6,
    threads_per_core=1,
    submit_timeout_ms=5000,
    sequential_callbacks=True,
    schedule=[0, 1, 2],       # data-parallel across all three RK3588 cores
    enable_pacing=False,
    disable_dup_context=False,
)

sess = InferenceSession("model.rknn", provider_options=opts)

Get Started

Guides

Configuration

Provider options reference for EZ RKNN Async

Two ways to pass options

Option reference

Input layout

Queue and threading

Callback ordering

Core scheduling

Throughput pacing

Context duplication

Custom operator plugins

Complete example

Build docs developers (and LLMs) love

Get Started

Guides

Configuration

Documentation Index

​Two ways to pass options

​Option reference

​Input layout

​Queue and threading

​Callback ordering

​Core scheduling

​Throughput pacing

​Context duplication

​Custom operator plugins

​Complete example

Build docs developers (and LLMs) love

Two ways to pass options

Option reference

Input layout

Queue and threading

Callback ordering

Core scheduling

Throughput pacing

Context duplication

Custom operator plugins

Complete example