EZ RKNN Async: Fast async inference for Rockchip NPUs

EZ RKNN Async gives you a drop-in replacement for onnxruntime.InferenceSession that targets Rockchip RKNPU2 hardware. It goes beyond the official SDK by adding true async callbacks, configurable multi-core data-parallel scheduling, deep pipeline inference, and custom operator plugin support — all with zero Python dependencies beyond NumPy.

Installation

Install the package and set up your Rockchip NPU environment in minutes.

Quickstart

Run your first RKNN model inference with a complete working example.

Inference modes

Understand sync, async, and pipeline inference and when to use each.

API Reference

Full reference for InferenceSession, NodeArg, and provider options.

Why EZ RKNN Async?

The official RKNN SDK exposes a complex, proprietary API with limited async support and no data-parallel multi-core scheduling. EZ RKNN Async solves this with an ORT-style interface that makes migration from onnxruntime straightforward and unlocks the full performance of your Rockchip NPU.

Multi-core parallelism

Use data-parallel scheduling across NPU cores to maximize throughput.

Async inference

Submit tasks with callbacks and get results without blocking your main thread.

Pipeline inference

Keep all NPU cores busy with configurable pipeline depth for streaming workloads.

Custom operators

Load custom operator plugins from .so files to extend model capabilities.

Feature comparison

Feature	EZ RKNN Async	Official SDK
Model loading & basic inference	✅	✅
Multi-core tensor parallel inference	✅	✅
Multi-core data parallel inference	✅	❌
Pipeline-based async inference	✅	⚠️ Limited
True async inference (callback/future)	✅	❌
Multi-batch data parallel inference	✅	⚠️ Limited
Custom operator plugins	✅	❌
ORT-compatible API	✅	❌
Zero extra dependencies	✅ (NumPy only)	❌

Getting started

Install the package

Build and install from source on your Rockchip device. See Installation for full instructions.

Create an InferenceSession

Point it at your .rknn model file with your preferred provider options.

from ztu_somemodelruntime_ez_rknn_async import InferenceSession, make_provider_options

session = InferenceSession(
    "model.rknn",
    provider_options=make_provider_options(schedule=[0, 1, 2])
)

Run inference

Pass NumPy arrays and get results back as a list of NumPy arrays — identical to onnxruntime.

import numpy as np

input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
outputs = session.run(None, {"input": input_data})
print(outputs[0].shape)

EZ RKNN Async requires a Rockchip device with RKNPU2 support (RK3588, RK3566, RK3568, etc.) and the librknnrt.so runtime library installed. Python 3.7+ is supported.

Get Started

Guides

Configuration

EZ RKNN Async: Fast async inference for Rockchip NPUs

Installation

Quickstart

Inference modes

API Reference

Why EZ RKNN Async?

Multi-core parallelism

Async inference

Pipeline inference

Custom operators

Feature comparison

Getting started

Build docs developers (and LLMs) love

Get Started

Guides

Configuration

Documentation Index

Installation

Quickstart

Inference modes

API Reference

​Why EZ RKNN Async?

Multi-core parallelism

Async inference

Pipeline inference

Custom operators

​Feature comparison

​Getting started

Build docs developers (and LLMs) love

Why EZ RKNN Async?

Feature comparison

Getting started