Kroko-ONNX Engine: Streaming ONNX ASR for RealtimeSTT

The Kroko-ONNX engine integrates the kroko-ai/kroko-onnx runtime with Kroko/Banafo streaming .data models. It is a fully local streaming ASR engine designed for fast, accurate on-device speech recognition without any cloud dependency. The adapter is lazy-loaded, so normal RealtimeSTT installs do not require Kroko-ONNX to be present.

Engine Names

The following names all select this engine:

kroko_onnx
kroko
banafo_kroko

Hyphenated CLI forms such as kroko-onnx and banafo-kroko are also accepted by the generic engine-name normalization.

Install

The kroko-builder extra does not install Kroko-ONNX directly. It exposes the stt-install-kroko builder command, which then builds and installs the Kroko-ONNX wheel into your active environment.

The silero-onnx-cpu extra is required for recorder-based tests and live microphone use with AudioToTextRecorder. If you only need to build the Kroko-ONNX wheel without the recorder, kroko-builder alone is sufficient.

Install the builder and VAD extras

pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"

Build and install Kroko-ONNX

stt-install-kroko --build

On Windows, Docker Desktop must be running with the WSL2 backend before you execute this command. The Docker Desktop Linux engine must be available — not just the Docker CLI. Verify your prerequisites with:

python --version
git --version
docker version

docker version should print both a Client and a Server section. If you only see client output and an engine connection error, start Docker Desktop, wait until it reports that Docker is running, and retry. (docker --version only checks that the CLI is installed; it does not verify the engine.)If the default builder cache is not writable, pass a project-local work directory:

stt-install-kroko --build --work-dir .\kroko-builder-work

The helper automatically falls back to .\kroko-builder-work when no --work-dir is set and the default location is not writable.Windows requirements:

Python 3.12 x64 (CPython)
Git
Docker Desktop running with the WSL2 backend

Linux requirements:

Git
CMake
A working C/C++ build toolchain

Download a Community model

After the builder finishes, download a public Community model from Banafo/Kroko-ASR:

New-Item -ItemType Directory -Path test-model-cache\kroko-onnx -Force
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='Banafo/Kroko-ASR', filename='Kroko-EN-Community-64-L-Streaming-001.data', local_dir='test-model-cache/kroko-onnx')"

On Linux/macOS:

mkdir -p test-model-cache/kroko-onnx
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='Banafo/Kroko-ASR', filename='Kroko-EN-Community-64-L-Streaming-001.data', local_dir='test-model-cache/kroko-onnx')"

The kroko-builder extra installs huggingface_hub automatically. If you skipped that extra, install it separately first:

pip install huggingface_hub

Community vs Pro Models

Community models are free, public .data files hosted on Banafo/Kroko-ASR. RealtimeSTT can download known Community files automatically when auto_download_model is enabled (the default). Bare Kroko filenames are cached under ~/.cache/realtimestt/kroko-onnx unless download_root points elsewhere. Currently known public Community models:

Kroko-EN-Community-64-L-Streaming-001.data
Kroko-EN-Community-128-L-Streaming-001.data

Pro models are licensed private .data files. Pass an existing file path, a model_download_url, or explicit Hugging Face repo/token options. Use --variant pro when building to enable Pro model support:

stt-install-kroko --build --variant pro

Free Community Kroko wheels cannot load Pro .data models. Attempting to do so typically produces a payload parsing or block-size mismatch error from the native library.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="kroko_onnx",
    model="Kroko-EN-Community-64-L-Streaming-001.data",
    device="cpu",
    language="en",
    transcription_engine_options={
        "provider": "cpu",
        "num_threads": 2,
    },
)

Streaming Realtime Support

Kroko-ONNX is a true streaming engine. It maintains a persistent native recognizer stream and feeds only new audio frames to it during realtime transcription. Final transcription still uses a single full-utterance call. Kroko model names encode native streaming cadence as chunk_number × 20 ms. For example, a model numbered 16 emits partials roughly every 320 ms, 32 every 640 ms, and 64 every 1 280 ms. Feeding smaller chunks does not force faster partials; it only reduces buffer and scheduling latency.

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="kroko_onnx",
    model="Kroko-EN-Community-128-L-Streaming-001.data",
    enable_realtime_transcription=True,
    realtime_transcription_engine="kroko_onnx",
    realtime_model_type="Kroko-EN-Pro-16-L-Streaming-001.data",
    realtime_transcription_engine_options={
        "provider": "cpu",
        "num_threads": 4,
        "key": "...",
        "suppress_native_output": True,
    },
)

Pro-16-L is the fastest measured partial-cadence option when Pro access is available.

Options Reference

All options are passed through transcription_engine_options:

Option	Meaning
`model_path`	Explicit `.data` model file. Overrides `model`.
`model_dir`	Directory containing a single `.data` file, or the default English Community filename.
`model_filename`	File name to use inside `model_dir`.
`auto_download_model` / `download_model`	Download missing public Community model files. Defaults to `True`.
`model_download_url`	Direct download URL for a missing `.data` file. Useful for Pro/private models.
`model_repo_id`, `model_revision`, `hf_token`	Optional Hugging Face download settings.
`key`	License key for Pro models.
`referralcode`	Optional Kroko referral code.
`provider`	`cpu`, `cuda`, or `coreml`. Defaults from the top-level `device` option.
`num_threads`	Runtime thread count. Defaults to `1`.
`sample_rate`	Kroko recognizer sample rate. Defaults to `16000`.
`feature_dim`	Feature dimension. Defaults to `80`.
`decoding_method`	`greedy_search` or `modified_beam_search`.
`max_active_paths`	Beam paths for modified beam search.
`hotwords_file`, `hotwords_score`	Optional hotword biasing inputs.
`blank_penalty`	Blank-symbol penalty during decoding.
`enable_endpoint_detection`	Enables Kroko endpoint detection.
`rule1_min_trailing_silence`, `rule2_min_trailing_silence`, `rule3_min_utterance_length`	Endpoint rule values.
`tail_padding_seconds` / `finalization_padding_seconds`	Silence padding appended before one-shot decoding. Defaults to `"auto"`, inferred from model cadence plus a small margin.
`suppress_native_output`	Redirects Kroko native stdout/stderr during recognizer calls and sets `KROKO_ONNX_SUPPRESS_LICENSE_OUTPUT=1`. Aliases: `suppress_output`, `quiet`, `silent`.
`recognizer`	Extra dictionary merged into `OnlineRecognizer.from_transducer(...)`.

suppress_native_output is a Python-side mitigation combined with an environment flag. Reliable suppression of asynchronous Pro license refresh messages (such as Remaining seconds updated: ...) requires a Kroko wheel built with RealtimeSTT’s native patch. Older or unpatched wheels may still print background license messages.

FastAPI Server Example

$model = "test-model-cache\kroko-onnx\Kroko-EN-Community-64-L-Streaming-001.data"
python example_fastapi_server\server.py `
  --host 0.0.0.0 `
  --port 8010 `
  --engine kroko_onnx `
  --model $model `
  --realtime-engine kroko_onnx `
  --realtime-model $model `
  --device cpu `
  --language en `
  --engine-options '{"provider":"cpu","num_threads":2}' `
  --realtime-engine-options '{"provider":"cpu","num_threads":1}'

Troubleshooting

Missing dependency errors mean kroko_onnx is not importable in the active environment. Install Kroko-ONNX into that same environment using the builder.
Missing model errors name the exact .data file path that RealtimeSTT tried to open. Check model, model_path, or download_root.
Free wheel + Pro model combinations fail with a payload parsing or block-size mismatch error. Build with --variant pro.
CUDA runs require CUDA-capable hardware and a Kroko-ONNX build with CUDA provider support.
On Windows, prefer the cross-platform-builds wheel workflow over a direct native source build.

Get Started

Guides

Transcription Engines

Resources

Kroko-ONNX Engine: Streaming ONNX ASR for RealtimeSTT

Engine Names

Install

Community vs Pro Models

Basic Usage

Streaming Realtime Support

Options Reference

FastAPI Server Example

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Engine Names

​Install

​Community vs Pro Models

​Basic Usage

​Streaming Realtime Support

​Options Reference

​FastAPI Server Example

​Troubleshooting

Build docs developers (and LLMs) love

Engine Names

Install

Community vs Pro Models

Basic Usage

Streaming Realtime Support

Options Reference

FastAPI Server Example

Troubleshooting