EZ RKNN Async exposes anDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt
Use this file to discover all available pages before exploring further.
InferenceSession class modelled after ONNX Runtime. If you have used ORT before, the patterns here will look familiar. This page walks through loading a model, inspecting its inputs and outputs, configuring the provider, and running synchronous inference.
Import the library
InferenceSession, ModelMetadata, NodeArg, and the provider-options helpers RknnProviderOptions and make_provider_options.
Create an InferenceSession
Pass the path to a compiled.rknn model file. By default the session uses one worker thread on NPU core 0.
Configure provider options
Usemake_provider_options to control scheduling, layout, queue depth, and other runtime parameters. Pass the result as provider_options.
make_provider_options accepts keyword arguments only and returns a typed RknnProviderOptions dict. All parameters are optional — omit any you do not need.
schedule and tp_mode are mutually exclusive. Setting both raises a ValueError. Use schedule for data-parallel multi-core distribution and tp_mode for tensor-parallel mode.Inspect the model
List input and output names
Inspect tensor shapes and types
get_inputs() and get_outputs() return lists of NodeArg objects. Each NodeArg has .name, .shape, and .type properties.
Read model metadata
Run synchronous inference
session.run(output_names, input_feed) runs inference and returns a list of NumPy arrays — one per output.
Dict input feed
The most explicit form: adict mapping input names to NumPy arrays.
None as output_names returns all model outputs in order. Pass a list of strings to select specific outputs:
List input feed
When you know the order of inputs, you can pass a plain list instead of a dict:Single-array input feed
For models with exactly one input, you can pass the array directly:Work with outputs
run always returns a list of NumPy arrays, regardless of the input form used:
Full example
Next steps
Async inference
Use
run_async with callbacks to overlap inference with other work.Pipeline inference
Use
run_pipeline to stream frames through the NPU with low latency.Multi-core scheduling
Distribute requests across multiple NPU cores with the
schedule option.Provider options reference
Full reference for all
make_provider_options parameters.