Documentation Index
Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt
Use this file to discover all available pages before exploring further.
InferenceSession is the primary entry point for running RKNN models with EZ RKNN Async. Its constructor signature is intentionally compatible with onnxruntime.InferenceSession, so you can migrate ORT-based code by swapping the import and adding provider options — no other changes required.
Constructor
Filesystem path to a compiled
.rknn model file. Any object accepted by os.fspath() works, including pathlib.Path. Passing raw bytes is not yet supported and raises RuntimeError.Accepted for API compatibility with ONNX Runtime. Ignored at runtime.
Accepted for API compatibility with ONNX Runtime. Ignored at runtime.
Runtime configuration for the RKNN session. Pass a
RknnProviderOptions dict or a one-element list containing one. Use make_provider_options() to build a validated dict. When None, all options fall back to their defaults.Example
Methods
run
List of output tensor names to return. Pass
None to return all outputs in model order.Model inputs. Three forms are accepted:
- dict — maps each input name to a NumPy array. Keys must match
session.input_namesexactly. - list or tuple — arrays in the same order as
session.input_names. - single ndarray — only valid when the model has exactly one input.
Optional run-time flags. Supports the key
"ztu_modelrt_dispatch_batch" (bool). When True, the first dimension of each input array is treated as a batch index: the session splits the batch into individual inference requests, dispatches them concurrently across the async queue, then concatenates results. This is ignored by run_async.Examples
Batch dispatch mode requires the model’s input batch dimension to equal
1. The session slices the leading axis and submits each sample as a separate task.run_async
callback when inference completes. The call blocks only when the async queue is full (up to submit_timeout_ms).
Output names to include in the callback result, or
None for all outputs.Model inputs, in the same forms accepted by
run.Callable invoked when the inference task completes. Signature:
results— list of float32 arrays, one per requested output.Noneon failure.user_data— the value passed asuser_datatorun_async.err— empty string on success; a non-empty error message on failure.
Arbitrary value forwarded unchanged to
callback as its second argument. Useful for correlating results with the original request.Run-time flags. The
"ztu_modelrt_dispatch_batch" key is not supported by run_async and raises RuntimeError if set to True.Example
run_pipeline
depth in-flight tasks, returns the oldest completed result. Returns None until the pipeline fills.
This mode is designed for streaming workloads — for example, running inference on a video feed — where you want to overlap data preparation on the CPU with NPU inference.
Model inputs for the current frame, in the same forms accepted by
run.Pipeline depth: how many inference requests are kept in flight simultaneously. Must be greater than
0. Changing depth between calls automatically resets the pipeline.When
True, clears the pipeline queue before submitting the new request. Use this when switching input streams or after an error to avoid stale results draining out.Example
The first
depth calls return None while the pipeline fills. Plan your result-handling loop accordingly or use run / run_async when you need a result for every input frame.get_inputs
NodeArg objects describing each model input tensor. Each NodeArg exposes .name, .shape, and .type properties.
Example
get_outputs
NodeArg objects describing each model output tensor.
Example
get_modelmeta
ModelMetadata object containing metadata embedded in the .rknn file at conversion time.
Example
Properties
Read-only list of input tensor names, in model order. Equivalent to
[n.name for n in session.get_inputs()].Read-only list of output tensor names, in model order. Equivalent to
[n.name for n in session.get_outputs()].