Moonshine is an English-only streaming ASR model family from Useful Sensors. RealtimeSTT supports two integration paths: a Transformers backend that downloads models automatically from Hugging Face, and a sherpa-onnx backend that uses pre-extracted ONNX INT8 model files and ONNX Runtime for efficient CPU inference.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
The Moonshine adapter currently supports English transcription only. Passing any non-English language code raises an error.
Engine Names
moonshinemoonshine_streaming
Transformers Path
The Transformers path usesMoonshineStreamingForConditionalGeneration and AutoProcessor from the transformers package. Models are downloaded automatically from Hugging Face on first use.
Install
moonshine extra is an alias for the transformers extra. Install a CUDA-enabled PyTorch wheel first if you plan to run on GPU.
Basic Usage
device="cpu". The default model is UsefulSensors/moonshine-streaming-medium.
Controlling the Model Cache
The Transformers backend passesdownload_root as cache_dir when loading the model and processor. Set it to a writable path to control where model files land:
Options Reference
| Option bucket | Meaning |
|---|---|
transcription_engine_options["model"] | Passed to MoonshineStreamingForConditionalGeneration.from_pretrained. |
transcription_engine_options["processor"] | Passed to AutoProcessor.from_pretrained. |
transcription_engine_options["generate"] | Passed to model.generate(...). |
transcription_engine_options["sample_rate"] | Input sample rate for the processor. Defaults to the processor’s native rate or 16000. |
compute_type | Mapped to a torch dtype where supported (float16 on GPU, float32 on CPU). |
FastAPI Server Example
sherpa-onnx Path
The sherpa-onnx path routes Moonshine through ONNX Runtime using pre-converted INT8 model files. This is typically more predictable for CPU-focused server deployments because it avoids Python/Transformers overhead entirely.Install
Engine Names (sherpa-onnx Moonshine)
sherpa_onnx_moonshinesherpa_moonshinemoonshine_sherpa_onnx
Activating the sherpa-onnx Backend via the moonshine Engine
You can also keep themoonshine engine name and switch backends through transcription_engine_options:
When to Use Each Path
Transformers
Automatic model download from Hugging Face. Good for GPU inference or quick evaluation. Requires
transformers and torch.sherpa-onnx
Offline CPU INT8 inference with ONNX Runtime. Better for resource-constrained or air-gapped deployments. Requires manual model file download.
Troubleshooting
- Non-English language error — Moonshine is English-only in this adapter. Remove the
languageargument or set it to"en". - Model download failures — Confirm Hugging Face network access and set
download_rootto a writable directory. - High memory use — Switch to the sherpa-onnx INT8 path or use a smaller Moonshine model variant.
