Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/happyme531/ztu_somemodelruntime_ez_rknn_async/llms.txt

Use this file to discover all available pages before exploring further.

EZ RKNN Async (ztu_somemodelruntime_ez_rknn_async) is a Python library for running RKNN models on Rockchip devices powered by the RKNPU2 neural processing unit. It exposes an InferenceSession API modelled after ONNX Runtime, so you can migrate existing ORT-based code with minimal changes while gaining access to advanced scheduling features that the official Rockchip SDK does not support.

Key features

  • ORT-compatible API — drop-in InferenceSession interface makes migration from ONNX Runtime straightforward.
  • True async inference — submit tasks and receive results through callbacks or futures without blocking the calling thread.
  • Multi-core data-parallel scheduling — distribute independent inference requests across multiple NPU cores simultaneously.
  • Pipeline inference — overlap data loading, inference, and post-processing across a configurable pipeline depth.
  • Custom operator plugins — load .so plugin files at runtime to support model-specific custom ops.
  • NumPy-only dependency — no heavy runtime dependencies; just NumPy and the system librknnrt.so.
  • Open source — licensed under AGPLv3.

Supported hardware

EZ RKNN Async targets Linux devices with the RKNPU2 hardware block, including:
  • RK3588 / RK3588S
  • RK3566 / RK3568
  • Other Rockchip SoCs with RKNPU2
The library requires librknnrt.so to be installed on the target device. RKNN SDK version 2.4.1 or later is strongly recommended — older versions may produce unstable behavior.

Supported Python versions

Python 3.7 through 3.13 are supported. On Python 3.7, typing_extensions >= 4.0 is required in addition to NumPy.

Feature comparison

The table below compares EZ RKNN Async against the official Rockchip RKNN SDK Python bindings.
FeatureEZ RKNN AsyncOfficial SDK
Model loading & basic inference
Multi-core tensor-parallel inference
Multi-core data-parallel inference
Pipeline-based async inference⚠️ Limited (depth = 1)
True async inference (callback / future)
Multi-batch data-parallel inference⚠️ Limited (fixed batch / 4-D only)
Custom operator plugins
API styleORT-compatibleProprietary
NumPy-only dependencies
Open source✅ AGPLv3

License

EZ RKNN Async is released under the GNU Affero General Public License v3 (AGPLv3).

Next steps

Install the library

Build and install EZ RKNN Async from source on your Rockchip device.

Run your first model

Create an InferenceSession and run synchronous inference in minutes.

Async inference

Submit tasks with callbacks and process results without blocking.

Multi-core scheduling

Distribute inference across NPU cores for higher throughput.

Build docs developers (and LLMs) love