Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

The Kroko-ONNX engine integrates the kroko-ai/kroko-onnx runtime with Kroko/Banafo streaming .data models. It is a fully local streaming ASR engine designed for fast, accurate on-device speech recognition without any cloud dependency. The adapter is lazy-loaded, so normal RealtimeSTT installs do not require Kroko-ONNX to be present.

Engine Names

The following names all select this engine:
  • kroko_onnx
  • kroko
  • banafo_kroko
Hyphenated CLI forms such as kroko-onnx and banafo-kroko are also accepted by the generic engine-name normalization.

Install

The kroko-builder extra does not install Kroko-ONNX directly. It exposes the stt-install-kroko builder command, which then builds and installs the Kroko-ONNX wheel into your active environment.
The silero-onnx-cpu extra is required for recorder-based tests and live microphone use with AudioToTextRecorder. If you only need to build the Kroko-ONNX wheel without the recorder, kroko-builder alone is sufficient.
1

Install the builder and VAD extras

pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
2

Build and install Kroko-ONNX

stt-install-kroko --build
On Windows, Docker Desktop must be running with the WSL2 backend before you execute this command. The Docker Desktop Linux engine must be available — not just the Docker CLI. Verify your prerequisites with:
python --version
git --version
docker version
docker version should print both a Client and a Server section. If you only see client output and an engine connection error, start Docker Desktop, wait until it reports that Docker is running, and retry. (docker --version only checks that the CLI is installed; it does not verify the engine.)If the default builder cache is not writable, pass a project-local work directory:
stt-install-kroko --build --work-dir .\kroko-builder-work
The helper automatically falls back to .\kroko-builder-work when no --work-dir is set and the default location is not writable.Windows requirements:
  • Python 3.12 x64 (CPython)
  • Git
  • Docker Desktop running with the WSL2 backend
Linux requirements:
  • Git
  • CMake
  • A working C/C++ build toolchain
3

Download a Community model

After the builder finishes, download a public Community model from Banafo/Kroko-ASR:
New-Item -ItemType Directory -Path test-model-cache\kroko-onnx -Force
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='Banafo/Kroko-ASR', filename='Kroko-EN-Community-64-L-Streaming-001.data', local_dir='test-model-cache/kroko-onnx')"
On Linux/macOS:
mkdir -p test-model-cache/kroko-onnx
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='Banafo/Kroko-ASR', filename='Kroko-EN-Community-64-L-Streaming-001.data', local_dir='test-model-cache/kroko-onnx')"
The kroko-builder extra installs huggingface_hub automatically. If you skipped that extra, install it separately first:
pip install huggingface_hub

Community vs Pro Models

Community models are free, public .data files hosted on Banafo/Kroko-ASR. RealtimeSTT can download known Community files automatically when auto_download_model is enabled (the default). Bare Kroko filenames are cached under ~/.cache/realtimestt/kroko-onnx unless download_root points elsewhere. Currently known public Community models:
  • Kroko-EN-Community-64-L-Streaming-001.data
  • Kroko-EN-Community-128-L-Streaming-001.data
Pro models are licensed private .data files. Pass an existing file path, a model_download_url, or explicit Hugging Face repo/token options. Use --variant pro when building to enable Pro model support:
stt-install-kroko --build --variant pro
Free Community Kroko wheels cannot load Pro .data models. Attempting to do so typically produces a payload parsing or block-size mismatch error from the native library.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="kroko_onnx",
    model="Kroko-EN-Community-64-L-Streaming-001.data",
    device="cpu",
    language="en",
    transcription_engine_options={
        "provider": "cpu",
        "num_threads": 2,
    },
)

Streaming Realtime Support

Kroko-ONNX is a true streaming engine. It maintains a persistent native recognizer stream and feeds only new audio frames to it during realtime transcription. Final transcription still uses a single full-utterance call. Kroko model names encode native streaming cadence as chunk_number × 20 ms. For example, a model numbered 16 emits partials roughly every 320 ms, 32 every 640 ms, and 64 every 1 280 ms. Feeding smaller chunks does not force faster partials; it only reduces buffer and scheduling latency.
from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="kroko_onnx",
    model="Kroko-EN-Community-128-L-Streaming-001.data",
    enable_realtime_transcription=True,
    realtime_transcription_engine="kroko_onnx",
    realtime_model_type="Kroko-EN-Pro-16-L-Streaming-001.data",
    realtime_transcription_engine_options={
        "provider": "cpu",
        "num_threads": 4,
        "key": "...",
        "suppress_native_output": True,
    },
)
Pro-16-L is the fastest measured partial-cadence option when Pro access is available.

Options Reference

All options are passed through transcription_engine_options:
OptionMeaning
model_pathExplicit .data model file. Overrides model.
model_dirDirectory containing a single .data file, or the default English Community filename.
model_filenameFile name to use inside model_dir.
auto_download_model / download_modelDownload missing public Community model files. Defaults to True.
model_download_urlDirect download URL for a missing .data file. Useful for Pro/private models.
model_repo_id, model_revision, hf_tokenOptional Hugging Face download settings.
keyLicense key for Pro models.
referralcodeOptional Kroko referral code.
providercpu, cuda, or coreml. Defaults from the top-level device option.
num_threadsRuntime thread count. Defaults to 1.
sample_rateKroko recognizer sample rate. Defaults to 16000.
feature_dimFeature dimension. Defaults to 80.
decoding_methodgreedy_search or modified_beam_search.
max_active_pathsBeam paths for modified beam search.
hotwords_file, hotwords_scoreOptional hotword biasing inputs.
blank_penaltyBlank-symbol penalty during decoding.
enable_endpoint_detectionEnables Kroko endpoint detection.
rule1_min_trailing_silence, rule2_min_trailing_silence, rule3_min_utterance_lengthEndpoint rule values.
tail_padding_seconds / finalization_padding_secondsSilence padding appended before one-shot decoding. Defaults to "auto", inferred from model cadence plus a small margin.
suppress_native_outputRedirects Kroko native stdout/stderr during recognizer calls and sets KROKO_ONNX_SUPPRESS_LICENSE_OUTPUT=1. Aliases: suppress_output, quiet, silent.
recognizerExtra dictionary merged into OnlineRecognizer.from_transducer(...).
suppress_native_output is a Python-side mitigation combined with an environment flag. Reliable suppression of asynchronous Pro license refresh messages (such as Remaining seconds updated: ...) requires a Kroko wheel built with RealtimeSTT’s native patch. Older or unpatched wheels may still print background license messages.

FastAPI Server Example

$model = "test-model-cache\kroko-onnx\Kroko-EN-Community-64-L-Streaming-001.data"
python example_fastapi_server\server.py `
  --host 0.0.0.0 `
  --port 8010 `
  --engine kroko_onnx `
  --model $model `
  --realtime-engine kroko_onnx `
  --realtime-model $model `
  --device cpu `
  --language en `
  --engine-options '{"provider":"cpu","num_threads":2}' `
  --realtime-engine-options '{"provider":"cpu","num_threads":1}'

Troubleshooting

  • Missing dependency errors mean kroko_onnx is not importable in the active environment. Install Kroko-ONNX into that same environment using the builder.
  • Missing model errors name the exact .data file path that RealtimeSTT tried to open. Check model, model_path, or download_root.
  • Free wheel + Pro model combinations fail with a payload parsing or block-size mismatch error. Build with --variant pro.
  • CUDA runs require CUDA-capable hardware and a Kroko-ONNX build with CUDA provider support.
  • On Windows, prefer the cross-platform-builds wheel workflow over a direct native source build.

Build docs developers (and LLMs) love