Get started with EZ RKNN Async

EZ RKNN Async exposes an InferenceSession class modelled after ONNX Runtime. If you have used ORT before, the patterns here will look familiar. This page walks through loading a model, inspecting its inputs and outputs, configuring the provider, and running synchronous inference.

Import the library

import numpy as np
import ztu_somemodelruntime_ez_rknn_async as ez

The public API exports four names: InferenceSession, ModelMetadata, NodeArg, and the provider-options helpers RknnProviderOptions and make_provider_options.

Create an InferenceSession

Pass the path to a compiled .rknn model file. By default the session uses one worker thread on NPU core 0.

session = ez.InferenceSession("model.rknn")

Configure provider options

Use make_provider_options to control scheduling, layout, queue depth, and other runtime parameters. Pass the result as provider_options.

opts = ez.make_provider_options(
    layout="nchw",        # input layout: "nchw" (default) or "nhwc"
    max_queue_size=3,     # max pending async tasks
    threads_per_core=1,   # worker threads per NPU core
)

session = ez.InferenceSession("model.rknn", provider_options=opts)

make_provider_options accepts keyword arguments only and returns a typed RknnProviderOptions dict. All parameters are optional — omit any you do not need.

schedule and tp_mode are mutually exclusive. Setting both raises a ValueError. Use schedule for data-parallel multi-core distribution and tp_mode for tensor-parallel mode.

Inspect the model

List input and output names

print(session.input_names)   # e.g. ["images"]
print(session.output_names)  # e.g. ["output0", "output1"]

Inspect tensor shapes and types

get_inputs() and get_outputs() return lists of NodeArg objects. Each NodeArg has .name, .shape, and .type properties.

for node in session.get_inputs():
    print(node.name, node.shape, node.type)

for node in session.get_outputs():
    print(node.name, node.shape, node.type)

Read model metadata

meta = session.get_modelmeta()
print(meta.custom_metadata_map)  # dict[str, str]

Run synchronous inference

session.run(output_names, input_feed) runs inference and returns a list of NumPy arrays — one per output.

Dict input feed

The most explicit form: a dict mapping input names to NumPy arrays.

input_data = np.random.randn(1, 3, 640, 640).astype(np.float32)

outputs = session.run(
    None,                          # None = return all outputs
    {"images": input_data},
)

print(type(outputs))    # <class 'list'>
print(outputs[0].shape) # shape of the first output tensor

Passing None as output_names returns all model outputs in order. Pass a list of strings to select specific outputs:

outputs = session.run(["output0"], {"images": input_data})

List input feed

When you know the order of inputs, you can pass a plain list instead of a dict:

outputs = session.run(None, [input_data])

Single-array input feed

For models with exactly one input, you can pass the array directly:

outputs = session.run(None, input_data)

Work with outputs

run always returns a list of NumPy arrays, regardless of the input form used:

outputs = session.run(None, {"images": input_data})

for i, out in enumerate(outputs):
    print(f"output[{i}]: shape={out.shape}, dtype={out.dtype}")

Full example

import numpy as np
import ztu_somemodelruntime_ez_rknn_async as ez

# Load model
session = ez.InferenceSession(
    "model.rknn",
    provider_options=ez.make_provider_options(layout="nchw"),
)

# Inspect model I/O
for node in session.get_inputs():
    print(f"input  {node.name}: shape={node.shape}, type={node.type}")

for node in session.get_outputs():
    print(f"output {node.name}: shape={node.shape}, type={node.type}")

# Prepare input matching the model's expected shape and dtype
inp = np.random.randn(1, 3, 640, 640).astype(np.float32)

# Run inference
results = session.run(None, {session.input_names[0]: inp})

print(f"Got {len(results)} output(s)")
for i, r in enumerate(results):
    print(f"  output[{i}]: {r.shape} {r.dtype}")

Next steps

Async inference

Use run_async with callbacks to overlap inference with other work.

Pipeline inference

Use run_pipeline to stream frames through the NPU with low latency.

Multi-core scheduling

Distribute requests across multiple NPU cores with the schedule option.

Provider options reference

Full reference for all make_provider_options parameters.

Get Started

Guides

Configuration

Get started with EZ RKNN Async

Import the library

Create an InferenceSession

Configure provider options

Inspect the model

List input and output names

Inspect tensor shapes and types

Read model metadata

Run synchronous inference

Dict input feed

List input feed

Single-array input feed

Work with outputs

Full example

Next steps

Async inference

Pipeline inference

Multi-core scheduling

Provider options reference

Build docs developers (and LLMs) love

Get Started

Guides

Configuration

Documentation Index

​Import the library

​Create an InferenceSession

​Configure provider options

​Inspect the model

​List input and output names

​Inspect tensor shapes and types

​Read model metadata

​Run synchronous inference

​Dict input feed

​List input feed

​Single-array input feed

​Work with outputs

​Full example

​Next steps

Async inference

Pipeline inference

Multi-core scheduling

Provider options reference

Build docs developers (and LLMs) love

Import the library

Create an InferenceSession

Configure provider options

Inspect the model

List input and output names

Inspect tensor shapes and types

Read model metadata

Run synchronous inference

Dict input feed

List input feed

Single-array input feed

Work with outputs

Full example

Next steps