Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

Moonshine is an English-only streaming ASR model family from Useful Sensors. RealtimeSTT supports two integration paths: a Transformers backend that downloads models automatically from Hugging Face, and a sherpa-onnx backend that uses pre-extracted ONNX INT8 model files and ONNX Runtime for efficient CPU inference.
The Moonshine adapter currently supports English transcription only. Passing any non-English language code raises an error.

Engine Names

  • moonshine
  • moonshine_streaming

Transformers Path

The Transformers path uses MoonshineStreamingForConditionalGeneration and AutoProcessor from the transformers package. Models are downloaded automatically from Hugging Face on first use.

Install

pip install "RealtimeSTT[moonshine]"
The moonshine extra is an alias for the transformers extra. Install a CUDA-enabled PyTorch wheel first if you plan to run on GPU.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="moonshine",
    model="UsefulSensors/moonshine-streaming-medium",
    language="en",
    device="cuda",
)
For CPU inference, use device="cpu". The default model is UsefulSensors/moonshine-streaming-medium.

Controlling the Model Cache

The Transformers backend passes download_root as cache_dir when loading the model and processor. Set it to a writable path to control where model files land:
recorder = AudioToTextRecorder(
    transcription_engine="moonshine",
    model="UsefulSensors/moonshine-streaming-medium",
    download_root="models/hf",
    language="en",
)

Options Reference

Option bucketMeaning
transcription_engine_options["model"]Passed to MoonshineStreamingForConditionalGeneration.from_pretrained.
transcription_engine_options["processor"]Passed to AutoProcessor.from_pretrained.
transcription_engine_options["generate"]Passed to model.generate(...).
transcription_engine_options["sample_rate"]Input sample rate for the processor. Defaults to the processor’s native rate or 16000.
compute_typeMapped to a torch dtype where supported (float16 on GPU, float32 on CPU).

FastAPI Server Example

python example_fastapi_server/server.py \
  --engine moonshine \
  --model UsefulSensors/moonshine-streaming-medium \
  --realtime-engine moonshine \
  --realtime-model UsefulSensors/moonshine-streaming-medium \
  --language en \
  --device cuda

sherpa-onnx Path

The sherpa-onnx path routes Moonshine through ONNX Runtime using pre-converted INT8 model files. This is typically more predictable for CPU-focused server deployments because it avoids Python/Transformers overhead entirely.

Install

pip install "RealtimeSTT[sherpa-onnx]"
Model files are not downloaded automatically — you must supply the path to pre-extracted ONNX files. See the sherpa-onnx engine page for model download instructions.

Engine Names (sherpa-onnx Moonshine)

  • sherpa_onnx_moonshine
  • sherpa_moonshine
  • moonshine_sherpa_onnx

Activating the sherpa-onnx Backend via the moonshine Engine

You can also keep the moonshine engine name and switch backends through transcription_engine_options:
from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="moonshine",
    transcription_engine_options={
        "backend": "sherpa_onnx",
    },
)

When to Use Each Path

Transformers

Automatic model download from Hugging Face. Good for GPU inference or quick evaluation. Requires transformers and torch.

sherpa-onnx

Offline CPU INT8 inference with ONNX Runtime. Better for resource-constrained or air-gapped deployments. Requires manual model file download.

Troubleshooting

  • Non-English language error — Moonshine is English-only in this adapter. Remove the language argument or set it to "en".
  • Model download failures — Confirm Hugging Face network access and set download_root to a writable directory.
  • High memory use — Switch to the sherpa-onnx INT8 path or use a smaller Moonshine model variant.

Build docs developers (and LLMs) love