Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt

Use this file to discover all available pages before exploring further.

EZ RKNN Async exposes an InferenceSession class modelled after ONNX Runtime. If you have used ORT before, the patterns here will look familiar. This page walks through loading a model, inspecting its inputs and outputs, configuring the provider, and running synchronous inference.

Import the library

import numpy as np
import ztu_somemodelruntime_ez_rknn_async as ez
The public API exports four names: InferenceSession, ModelMetadata, NodeArg, and the provider-options helpers RknnProviderOptions and make_provider_options.

Create an InferenceSession

Pass the path to a compiled .rknn model file. By default the session uses one worker thread on NPU core 0.
session = ez.InferenceSession("model.rknn")

Configure provider options

Use make_provider_options to control scheduling, layout, queue depth, and other runtime parameters. Pass the result as provider_options.
opts = ez.make_provider_options(
    layout="nchw",        # input layout: "nchw" (default) or "nhwc"
    max_queue_size=3,     # max pending async tasks
    threads_per_core=1,   # worker threads per NPU core
)

session = ez.InferenceSession("model.rknn", provider_options=opts)
make_provider_options accepts keyword arguments only and returns a typed RknnProviderOptions dict. All parameters are optional — omit any you do not need.
schedule and tp_mode are mutually exclusive. Setting both raises a ValueError. Use schedule for data-parallel multi-core distribution and tp_mode for tensor-parallel mode.

Inspect the model

List input and output names

print(session.input_names)   # e.g. ["images"]
print(session.output_names)  # e.g. ["output0", "output1"]

Inspect tensor shapes and types

get_inputs() and get_outputs() return lists of NodeArg objects. Each NodeArg has .name, .shape, and .type properties.
for node in session.get_inputs():
    print(node.name, node.shape, node.type)

for node in session.get_outputs():
    print(node.name, node.shape, node.type)

Read model metadata

meta = session.get_modelmeta()
print(meta.custom_metadata_map)  # dict[str, str]

Run synchronous inference

session.run(output_names, input_feed) runs inference and returns a list of NumPy arrays — one per output.

Dict input feed

The most explicit form: a dict mapping input names to NumPy arrays.
input_data = np.random.randn(1, 3, 640, 640).astype(np.float32)

outputs = session.run(
    None,                          # None = return all outputs
    {"images": input_data},
)

print(type(outputs))    # <class 'list'>
print(outputs[0].shape) # shape of the first output tensor
Passing None as output_names returns all model outputs in order. Pass a list of strings to select specific outputs:
outputs = session.run(["output0"], {"images": input_data})

List input feed

When you know the order of inputs, you can pass a plain list instead of a dict:
outputs = session.run(None, [input_data])

Single-array input feed

For models with exactly one input, you can pass the array directly:
outputs = session.run(None, input_data)

Work with outputs

run always returns a list of NumPy arrays, regardless of the input form used:
outputs = session.run(None, {"images": input_data})

for i, out in enumerate(outputs):
    print(f"output[{i}]: shape={out.shape}, dtype={out.dtype}")

Full example

import numpy as np
import ztu_somemodelruntime_ez_rknn_async as ez

# Load model
session = ez.InferenceSession(
    "model.rknn",
    provider_options=ez.make_provider_options(layout="nchw"),
)

# Inspect model I/O
for node in session.get_inputs():
    print(f"input  {node.name}: shape={node.shape}, type={node.type}")

for node in session.get_outputs():
    print(f"output {node.name}: shape={node.shape}, type={node.type}")

# Prepare input matching the model's expected shape and dtype
inp = np.random.randn(1, 3, 640, 640).astype(np.float32)

# Run inference
results = session.run(None, {session.input_names[0]: inp})

print(f"Got {len(results)} output(s)")
for i, r in enumerate(results):
    print(f"  output[{i}]: {r.shape} {r.dtype}")

Next steps

Async inference

Use run_async with callbacks to overlap inference with other work.

Pipeline inference

Use run_pipeline to stream frames through the NPU with low latency.

Multi-core scheduling

Distribute requests across multiple NPU cores with the schedule option.

Provider options reference

Full reference for all make_provider_options parameters.

Build docs developers (and LLMs) love