The Kroko-ONNX engine integrates the kroko-ai/kroko-onnx runtime with Kroko/Banafo streamingDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
.data models. It is a fully local streaming ASR engine designed for fast, accurate on-device speech recognition without any cloud dependency. The adapter is lazy-loaded, so normal RealtimeSTT installs do not require Kroko-ONNX to be present.
Engine Names
The following names all select this engine:kroko_onnxkrokobanafo_kroko
kroko-onnx and banafo-kroko are also accepted by the generic engine-name normalization.
Install
The
silero-onnx-cpu extra is required for recorder-based tests and live microphone use with AudioToTextRecorder. If you only need to build the Kroko-ONNX wheel without the recorder, kroko-builder alone is sufficient.Build and install Kroko-ONNX
docker version should print both a Client and a Server section. If you only see client output and an engine connection error, start Docker Desktop, wait until it reports that Docker is running, and retry. (docker --version only checks that the CLI is installed; it does not verify the engine.)If the default builder cache is not writable, pass a project-local work directory:.\kroko-builder-work when no --work-dir is set and the default location is not writable.Windows requirements:- Python 3.12 x64 (CPython)
- Git
- Docker Desktop running with the WSL2 backend
- Git
- CMake
- A working C/C++ build toolchain
Download a Community model
After the builder finishes, download a public Community model from Banafo/Kroko-ASR:On Linux/macOS:The
kroko-builder extra installs huggingface_hub automatically. If you skipped that extra, install it separately first:Community vs Pro Models
Community models are free, public.data files hosted on Banafo/Kroko-ASR. RealtimeSTT can download known Community files automatically when auto_download_model is enabled (the default). Bare Kroko filenames are cached under ~/.cache/realtimestt/kroko-onnx unless download_root points elsewhere.
Currently known public Community models:
Kroko-EN-Community-64-L-Streaming-001.dataKroko-EN-Community-128-L-Streaming-001.data
.data files. Pass an existing file path, a model_download_url, or explicit Hugging Face repo/token options. Use --variant pro when building to enable Pro model support:
Basic Usage
Streaming Realtime Support
Kroko-ONNX is a true streaming engine. It maintains a persistent native recognizer stream and feeds only new audio frames to it during realtime transcription. Final transcription still uses a single full-utterance call. Kroko model names encode native streaming cadence aschunk_number × 20 ms. For example, a model numbered 16 emits partials roughly every 320 ms, 32 every 640 ms, and 64 every 1 280 ms. Feeding smaller chunks does not force faster partials; it only reduces buffer and scheduling latency.
Pro-16-L is the fastest measured partial-cadence option when Pro access is available.
Options Reference
All options are passed throughtranscription_engine_options:
| Option | Meaning |
|---|---|
model_path | Explicit .data model file. Overrides model. |
model_dir | Directory containing a single .data file, or the default English Community filename. |
model_filename | File name to use inside model_dir. |
auto_download_model / download_model | Download missing public Community model files. Defaults to True. |
model_download_url | Direct download URL for a missing .data file. Useful for Pro/private models. |
model_repo_id, model_revision, hf_token | Optional Hugging Face download settings. |
key | License key for Pro models. |
referralcode | Optional Kroko referral code. |
provider | cpu, cuda, or coreml. Defaults from the top-level device option. |
num_threads | Runtime thread count. Defaults to 1. |
sample_rate | Kroko recognizer sample rate. Defaults to 16000. |
feature_dim | Feature dimension. Defaults to 80. |
decoding_method | greedy_search or modified_beam_search. |
max_active_paths | Beam paths for modified beam search. |
hotwords_file, hotwords_score | Optional hotword biasing inputs. |
blank_penalty | Blank-symbol penalty during decoding. |
enable_endpoint_detection | Enables Kroko endpoint detection. |
rule1_min_trailing_silence, rule2_min_trailing_silence, rule3_min_utterance_length | Endpoint rule values. |
tail_padding_seconds / finalization_padding_seconds | Silence padding appended before one-shot decoding. Defaults to "auto", inferred from model cadence plus a small margin. |
suppress_native_output | Redirects Kroko native stdout/stderr during recognizer calls and sets KROKO_ONNX_SUPPRESS_LICENSE_OUTPUT=1. Aliases: suppress_output, quiet, silent. |
recognizer | Extra dictionary merged into OnlineRecognizer.from_transducer(...). |
suppress_native_output is a Python-side mitigation combined with an environment flag. Reliable suppression of asynchronous Pro license refresh messages (such as Remaining seconds updated: ...) requires a Kroko wheel built with RealtimeSTT’s native patch. Older or unpatched wheels may still print background license messages.FastAPI Server Example
Troubleshooting
- Missing dependency errors mean
kroko_onnxis not importable in the active environment. Install Kroko-ONNX into that same environment using the builder. - Missing model errors name the exact
.datafile path that RealtimeSTT tried to open. Checkmodel,model_path, ordownload_root. - Free wheel + Pro model combinations fail with a payload parsing or block-size mismatch error. Build with
--variant pro. - CUDA runs require CUDA-capable hardware and a Kroko-ONNX build with CUDA provider support.
- On Windows, prefer the
cross-platform-buildswheel workflow over a direct native source build.
