Moonshine Speech Recognition Engine for RealtimeSTT

Moonshine is an English-only streaming ASR model family from Useful Sensors. RealtimeSTT supports two integration paths: a Transformers backend that downloads models automatically from Hugging Face, and a sherpa-onnx backend that uses pre-extracted ONNX INT8 model files and ONNX Runtime for efficient CPU inference.

The Moonshine adapter currently supports English transcription only. Passing any non-English language code raises an error.

Engine Names

moonshine
moonshine_streaming

Transformers Path

The Transformers path uses MoonshineStreamingForConditionalGeneration and AutoProcessor from the transformers package. Models are downloaded automatically from Hugging Face on first use.

Install

pip install "RealtimeSTT[moonshine]"

The moonshine extra is an alias for the transformers extra. Install a CUDA-enabled PyTorch wheel first if you plan to run on GPU.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="moonshine",
    model="UsefulSensors/moonshine-streaming-medium",
    language="en",
    device="cuda",
)

For CPU inference, use device="cpu". The default model is UsefulSensors/moonshine-streaming-medium.

Controlling the Model Cache

The Transformers backend passes download_root as cache_dir when loading the model and processor. Set it to a writable path to control where model files land:

recorder = AudioToTextRecorder(
    transcription_engine="moonshine",
    model="UsefulSensors/moonshine-streaming-medium",
    download_root="models/hf",
    language="en",
)

Options Reference

Option bucket	Meaning
`transcription_engine_options["model"]`	Passed to `MoonshineStreamingForConditionalGeneration.from_pretrained`.
`transcription_engine_options["processor"]`	Passed to `AutoProcessor.from_pretrained`.
`transcription_engine_options["generate"]`	Passed to `model.generate(...)`.
`transcription_engine_options["sample_rate"]`	Input sample rate for the processor. Defaults to the processor’s native rate or `16000`.
`compute_type`	Mapped to a torch dtype where supported (`float16` on GPU, `float32` on CPU).

FastAPI Server Example

python example_fastapi_server/server.py \
  --engine moonshine \
  --model UsefulSensors/moonshine-streaming-medium \
  --realtime-engine moonshine \
  --realtime-model UsefulSensors/moonshine-streaming-medium \
  --language en \
  --device cuda

sherpa-onnx Path

The sherpa-onnx path routes Moonshine through ONNX Runtime using pre-converted INT8 model files. This is typically more predictable for CPU-focused server deployments because it avoids Python/Transformers overhead entirely.

Install

pip install "RealtimeSTT[sherpa-onnx]"

Model files are not downloaded automatically — you must supply the path to pre-extracted ONNX files. See the sherpa-onnx engine page for model download instructions.

Engine Names (sherpa-onnx Moonshine)

sherpa_onnx_moonshine
sherpa_moonshine
moonshine_sherpa_onnx

Activating the sherpa-onnx Backend via the moonshine Engine

You can also keep the moonshine engine name and switch backends through transcription_engine_options:

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="moonshine",
    transcription_engine_options={
        "backend": "sherpa_onnx",
    },
)

When to Use Each Path

Transformers

Automatic model download from Hugging Face. Good for GPU inference or quick evaluation. Requires transformers and torch.

sherpa-onnx

Offline CPU INT8 inference with ONNX Runtime. Better for resource-constrained or air-gapped deployments. Requires manual model file download.

Troubleshooting

Non-English language error — Moonshine is English-only in this adapter. Remove the language argument or set it to "en".
Model download failures — Confirm Hugging Face network access and set download_root to a writable directory.
High memory use — Switch to the sherpa-onnx INT8 path or use a smaller Moonshine model variant.

Get Started

Guides

Transcription Engines

Resources

Moonshine Speech Recognition Engine for RealtimeSTT

Engine Names

Transformers Path

Install

Basic Usage

Controlling the Model Cache

Options Reference

FastAPI Server Example

sherpa-onnx Path

Install

Engine Names (sherpa-onnx Moonshine)

Activating the sherpa-onnx Backend via the moonshine Engine

When to Use Each Path

Transformers

sherpa-onnx

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Engine Names

​Transformers Path

​Install

​Basic Usage

​Controlling the Model Cache

​Options Reference

​FastAPI Server Example

​sherpa-onnx Path

​Install

​Engine Names (sherpa-onnx Moonshine)

​Activating the sherpa-onnx Backend via the moonshine Engine

​When to Use Each Path

Transformers

sherpa-onnx

​Troubleshooting

Build docs developers (and LLMs) love

Engine Names

Transformers Path

Install

Basic Usage

Controlling the Model Cache

Options Reference

FastAPI Server Example

sherpa-onnx Path

Install

Engine Names (sherpa-onnx Moonshine)

Activating the sherpa-onnx Backend via the moonshine Engine

When to Use Each Path

Troubleshooting