Migrate from onnxruntime to EZ RKNN Async

EZ RKNN Async was designed to minimize the effort required to move an existing onnxruntime codebase to Rockchip NPU acceleration. The InferenceSession class shares the same constructor signature, the same run, get_inputs, get_outputs, and get_modelmeta methods, and the same input_feed conventions. In most cases the migration is a single import change.

Import change

# Before
import onnxruntime as ort
sess = ort.InferenceSession("model.onnx")

# After
from ztu_somemodelruntime_ez_rknn_async import InferenceSession
sess = InferenceSession("model.rknn")

Your model must be compiled from ONNX to RKNN format using the Rockchip RKNN Toolkit before passing it to the session.

Constructor and provider options

Both libraries accept providers and provider_options keyword arguments. ORT uses them to select CUDA, TensorRT, or CPU execution. EZ RKNN Async ignores providers (RKNN is the only backend) and reads NPU configuration from provider_options.

import onnxruntime as ort

sess = ort.InferenceSession(
    "model.onnx",
    providers=["CUDAExecutionProvider"],
    provider_options=[{"device_id": 0}],
)

sess_options (ort.SessionOptions) is accepted in the constructor but silently ignored — it exists only so existing code that passes a SessionOptions object does not raise.

`run`: identical calling convention

run behaves the same as ORT: pass None as output_names to get all outputs, or a list of output name strings to filter. input_feed accepts a dict, a list/tuple, or a single numpy.ndarray (for single-input models).

outputs = sess.run(
    None,
    {"input": input_array},
)

`get_inputs`, `get_outputs`, `get_modelmeta`

All three methods are present and return objects with the same attribute names as ORT equivalents.

for node in sess.get_inputs():
    print(node.name, node.shape, node.type)

for node in sess.get_outputs():
    print(node.name, node.shape, node.type)

meta = sess.get_modelmeta()
print(meta.custom_metadata_map)  # contains "rknn_custom_string" if set at export

Differences to be aware of

No CUDA, CPU, or other providers

EZ RKNN Async runs exclusively on the RKNN NPU. There is no CPU fallback provider. Remove any provider selection logic and CUDA-specific code.

Outputs are always `float32`

The RKNN output collection is called with want_float=True unconditionally. Regardless of the quantized type stored in the model, all output tensors are returned as numpy.float32 arrays. If your post-processing code expected int8 or uint8 outputs from a quantized ORT session, remove that conversion step.

`run_async` callback signature differs from ORT

ORT’s run_async passes (outputs, user_data). EZ RKNN Async passes (results, user_data, err) where err is an empty string on success or an error message on failure.

# ORT style (two arguments)
def ort_callback(outputs, user_data):
    process(outputs)

# EZ RKNN Async style (three arguments)
def rknn_callback(results, user_data, err):
    if err:
        print(f"Error: {err}")
        return
    process(results)

Bytes model content is not supported

ORT accepts either a filesystem path or raw model bytes. EZ RKNN Async only accepts a filesystem path (string or os.PathLike). Attempting to pass bytes raises RuntimeError:

bytes model content is not supported yet; please pass a filesystem path

`input_names` and `output_names` properties

EZ RKNN Async adds input_names and output_names as properties on InferenceSession for convenience. ORT does not have these; use get_inputs() and get_outputs() in portable code.

Side-by-side migration checklist

Replace the import

# Remove
import onnxruntime as ort

# Add
from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

Update the session constructor

Change the model path from .onnx to .rknn. Replace providers=["CUDAExecutionProvider"] (or any other provider list) with provider_options=make_provider_options(...).

Verify run() call sites

run(None, input_feed) and run(output_names, input_feed) require no changes. Single-array and list-style input_feed also work unchanged.

Update run_async callbacks

Add the third err argument to every callback function and add error handling.

Remove bytes-path code

If your code constructs a session from bytes, switch to writing the model to a temporary file or to a fixed filesystem path first.

Remove float-cast post-processing

Delete any .astype(np.float32) conversion applied to outputs from quantized models — outputs are already float32.

Get Started

Guides

Configuration

Migrate from onnxruntime to EZ RKNN Async

Import change

Constructor and provider options

`run`: identical calling convention

`get_inputs`, `get_outputs`, `get_modelmeta`

Differences to be aware of

No CUDA, CPU, or other providers

Outputs are always `float32`

`run_async` callback signature differs from ORT

Bytes model content is not supported

`input_names` and `output_names` properties

Side-by-side migration checklist

Build docs developers (and LLMs) love

Get Started

Guides

Configuration

Documentation Index

​Import change

​Constructor and provider options

​run: identical calling convention

​get_inputs, get_outputs, get_modelmeta

​Differences to be aware of

​No CUDA, CPU, or other providers

​Outputs are always float32

​run_async callback signature differs from ORT

​Bytes model content is not supported

​input_names and output_names properties

​Side-by-side migration checklist

Build docs developers (and LLMs) love

Import change

Constructor and provider options

`run`: identical calling convention

`get_inputs`, `get_outputs`, `get_modelmeta`

Differences to be aware of

No CUDA, CPU, or other providers

Outputs are always `float32`

`run_async` callback signature differs from ORT

Bytes model content is not supported

`input_names` and `output_names` properties

Side-by-side migration checklist