Documentation Index
Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt
Use this file to discover all available pages before exploring further.
run_async lets you submit an inference task and return to the caller immediately. The session queues the work, a background worker thread runs it on the NPU, and a dedicated callback thread fires your function when the result is ready. This decouples your main thread from NPU execution time.
How it works
When you callrun_async, the session tries to place the task into the internal queue. If there is room, it returns None immediately and your callback is invoked later. If the queue is full, run_async blocks until capacity becomes available (up to submit_timeout_ms) and then raises if still saturated.
The maximum callback queue depth is 8 (MAX_CALLBACK_QUEUE_SIZE). The task queue depth is controlled by max_queue_size (default 3). Together they form a two-stage pipeline between submission and result delivery.
Callback signature
results is None when inference fails; check err for the reason.
Basic example
Queue behavior and back-pressure
run_async implements a blocking retry loop internally. When both the task queue (max_queue_size) and the callback queue (hard limit of 8) are full, it waits on a condition variable until a slot opens, then retries. If submit_timeout_ms elapses before a slot opens, it raises RuntimeError.
Callback ordering
Setsequential_callbacks=True (the default) to guarantee that callbacks are invoked in the exact order tasks were submitted, regardless of which NPU core finished first. The callback thread holds completed results in a map keyed by task ID and waits for the next expected ID before firing.
Set sequential_callbacks=False to fire callbacks as soon as results arrive, which can improve tail latency when core execution times vary.
run_async does not support ztu_modelrt_dispatch_batch=True in run_options. Attempting to set that flag raises a RuntimeError. Use run with ztu_modelrt_dispatch_batch=True for batch dispatch.Selecting output names
Likerun, run_async accepts an output_names list to filter which outputs are delivered to the callback. Pass None to receive all outputs.
Capacity limits summary
| Limit | Source | Default |
|---|---|---|
| Task queue max | max_queue_size option | 3 |
| Callback queue max | MAX_CALLBACK_QUEUE_SIZE (hard-coded) | 8 |
| Submit block timeout | submit_timeout_ms option | 10000 ms |