Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

RealtimeSTT includes CPU INT8 sherpa-onnx engines for both the Moonshine and Parakeet model families. These engines are useful when you want fully offline CPU inference without loading NeMo or Transformers at runtime. All computation happens through the ONNX Runtime via the sherpa-onnx Python package, and model files are small enough to vendor locally.
sherpa-onnx models are not downloaded automatically. You must download and extract the model bundle manually before starting the recorder. See Model Download below.

Install

pip install "RealtimeSTT[sherpa-onnx]"

Available Engine Names

Two sherpa-onnx engines are available, each with a set of accepted aliases:
EngineClassAliases
sherpa_onnx_moonshineSherpaOnnxMoonshineEnginesherpa_moonshine, moonshine_sherpa_onnx
sherpa_onnx_parakeetSherpaOnnxParakeetEnginesherpa_parakeet, parakeet_sherpa_onnx
All aliases resolve to the same engine class via the factory. Engine names are normalized (-_) before lookup.

Model Download

RealtimeSTT does not download sherpa-onnx model bundles automatically. Download the .tar.bz2 archives from the sherpa-onnx ASR model releases, extract them, and pass the extracted directory path as model. Known bundle names:
  • sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
  • sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8.tar.bz2
mkdir -p test-model-cache/sherpa-onnx

# Moonshine
curl -L -o test-model-cache/sherpa-onnx/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2 \
  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2
python -c "import tarfile; tarfile.open('test-model-cache/sherpa-onnx/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2', 'r:bz2').extractall('test-model-cache/sherpa-onnx')"

# Parakeet
curl -L -o test-model-cache/sherpa-onnx/sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8.tar.bz2 \
  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8.tar.bz2
python -c "import tarfile; tarfile.open('test-model-cache/sherpa-onnx/sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8.tar.bz2', 'r:bz2').extractall('test-model-cache/sherpa-onnx')"

Expected Model Files

After extraction, the engine looks for these files inside the model directory:
sherpa-onnx-moonshine-tiny-en-int8/
├── preprocess.onnx
├── encode.int8.onnx
├── uncached_decode.int8.onnx
├── cached_decode.int8.onnx
└── tokens.txt

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="sherpa_onnx_moonshine",
    model="test-model-cache/sherpa-onnx/sherpa-onnx-moonshine-tiny-en-int8",
    device="cpu",
    language="en",
    transcription_engine_options={
        "num_threads": 2,
        "provider": "cpu",
    },
)
When download_root is set, known model ids (such as nvidia/parakeet-tdt-0.6b-v3 or sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8) resolve to the expected extracted directory names under that root.

Engine Options

Pass these keys inside transcription_engine_options:
OptionTypeMeaning
model_dirstrExplicit path to the extracted model directory (overrides model).
filesdictDictionary overriding individual ONNX file names or paths.
num_threadsintNumber of CPU worker threads.
providerstrONNX Runtime provider. Default: "cpu".
decoding_methodstrsherpa-onnx decoding method. Default: "greedy_search".
debugboolEnables sherpa-onnx debug output.
rule_fstsstrOptional text normalization FST resource path.
rule_farsstrOptional text normalization FAR resource path.
hr_dict_dirstrOptional heteronym replacement dictionary directory.
hr_rule_fstsstrOptional heteronym replacement FST resource path.
hr_lexiconstrOptional heteronym replacement lexicon path.
input_sample_rateintInput audio sample rate. Default: 16000. Alias: sample_rate.
Parakeet-only transducer options:
OptionTypeDefaultMeaning
model_typestr"nemo_transducer"sherpa-onnx model type.
max_active_pathsint4Beam search paths.
hotwords_filestr""Path to hotwords file.
hotwords_scorefloat1.5Hotword boost score.
blank_penaltyfloat0.0Blank token penalty.
feature_dimint80Filterbank feature dimension.
lmstr""Optional language model path.
lm_scalefloat0.1Language model interpolation weight.
lodr_fststr""Low-order density ratio FST path.
lodr_scalefloat0.0Low-order density ratio interpolation scale.
ditherfloat0.0Dithering value applied to filterbank features.
modeling_unitstr"cjkchar"Modeling unit for the transducer vocabulary.
bpe_vocabstr""BPE vocabulary file path.

When to Use sherpa-onnx

Good fit

  • Fully offline CPU inference without PyTorch, NeMo, or Transformers
  • Predictable local model files that you control and version
  • INT8 ONNX models with small disk and memory footprint
  • Environments where network access to Hugging Face Hub is restricted

Consider other engines

  • You want automatic model downloads (faster_whisper or openai_whisper)
  • You need multilingual transcription (Moonshine adapter is English-only)
  • You have a CUDA-capable GPU (faster_whisper with float16 is faster)

Troubleshooting

1

Missing file errors

The error message names the exact expected ONNX or tokens.txt path. Check that the archive was fully extracted — not just downloaded. The model path must point to the extracted directory, not the .tar.bz2 file.
2

High latency

Lower the model size where possible, reduce the realtime cadence with realtime_processing_pause, and tune num_threads to match your CPU core count.
3

Language errors on Moonshine

The sherpa-onnx Moonshine tiny INT8 engine supports English ("en") only. Passing any other language raises a TranscriptionEngineError.

Build docs developers (and LLMs) love