Moonshine Voice requires model files to perform transcription and intent recognition. This guide covers downloading, locating, and managing these models.
Quick Start
Download a model using the Python module:
python -m moonshine_voice.download --language en
The script will:
- Download the model files
- Cache them locally
- Display the model path and architecture number
Example output:
encoder_model.ort: 100%|███████████████████| 29.9M/29.9M [00:00<00:00, 34.5MB/s]
decoder_model_merged.ort: 100%|████████████| 104M/104M [00:02<00:00, 52.6MB/s]
tokenizer.bin: 100%|████████████████████████| 244k/244k [00:00<00:00, 1.44MB/s]
Model arch: 1
Downloaded model path: /Users/username/Library/Caches/moonshine_voice/download.moonshine.ai/model/base-en/quantized/base-en
Supported Languages
Moonshine Voice supports these languages:
| Code | Language | Model Sizes |
|---|
en | English | Tiny, Tiny-Streaming, Base, Small-Streaming, Medium-Streaming |
es | Spanish | Base |
ar | Arabic | Base |
ja | Japanese | Tiny, Base |
ko | Korean | Tiny |
zh | Chinese | Base |
vi | Vietnamese | Base |
uk | Ukrainian | Base |
Download by Language Code
# English
python -m moonshine_voice.download --language en
# Spanish
python -m moonshine_voice.download --language es
# Japanese
python -m moonshine_voice.download --language ja
# Arabic
python -m moonshine_voice.download --language ar
Download by Language Name
python -m moonshine_voice.download --language English
python -m moonshine_voice.download --language Spanish
python -m moonshine_voice.download --language Japanese
Model Architectures
Available model architectures (from smallest to largest):
English Models
| Architecture | Code | Parameters | WER | Use Case |
|---|
| TINY | 0 | 26M | 12.66% | Constrained devices |
| TINY_STREAMING | 2 | 34M | 12.00% | Real-time, low-end devices |
| BASE | 1 | 58M | 10.07% | Balanced offline processing |
| SMALL_STREAMING | 3 | 123M | 7.84% | Real-time, good accuracy |
| MEDIUM_STREAMING | 4 | 245M | 6.65% | Real-time, best accuracy |
Download Specific Architecture
# Download English Medium Streaming (highest accuracy)
python -m moonshine_voice.download --language en --model-arch 4
# Download English Tiny (smallest)
python -m moonshine_voice.download --language en --model-arch 0
If you don’t specify --model-arch, the highest quality model for that language is downloaded by default.
Model Components
Streaming Model Components
Streaming models (Tiny-Streaming, Small-Streaming, Medium-Streaming) contain:
encoder.ort - Audio encoder
adapter.ort - Adapter layer
cross_kv.ort - Cross-attention key-value cache
decoder_kv.ort - Decoder key-value cache
frontend.ort - Audio preprocessing
streaming_config.json - Model configuration
tokenizer.bin - Text tokenizer
Non-Streaming Model Components
Non-streaming models (Tiny, Base) contain:
encoder_model.ort - Audio encoder
decoder_model_merged.ort - Merged decoder
tokenizer.bin - Text tokenizer
Using Models in Code
Automatic Download and Load
from moonshine_voice import get_model_for_language, Transcriber
# Download (if needed) and get path
model_path, model_arch = get_model_for_language("en")
# Create transcriber
transcriber = Transcriber(
model_path=model_path,
model_arch=model_arch
)
Specify Architecture
from moonshine_voice import get_model_for_language, ModelArch
# Get Medium Streaming model
model_path, model_arch = get_model_for_language(
wanted_language="en",
wanted_model_arch=ModelArch.MEDIUM_STREAMING
)
# Get Tiny model
model_path, model_arch = get_model_for_language(
wanted_language="en",
wanted_model_arch=ModelArch.TINY
)
Manual Path Specification
If you already have models downloaded:
from moonshine_voice import Transcriber, ModelArch
transcriber = Transcriber(
model_path="/path/to/model/directory",
model_arch=ModelArch.BASE
)
Cache Location
Models are cached in:
- macOS:
~/Library/Caches/moonshine_voice/
- Linux:
~/.cache/moonshine_voice/
- Windows:
%LOCALAPPDATA%\moonshine_voice\Cache\
Custom Cache Location
Set the MOONSHINE_VOICE_CACHE environment variable:
# Linux/macOS
export MOONSHINE_VOICE_CACHE=/my/custom/cache
python -m moonshine_voice.download --language en
# Windows
set MOONSHINE_VOICE_CACHE=C:\my\custom\cache
python -m moonshine_voice.download --language en
Embedding Models (for Intent Recognition)
Download embedding models for intent recognition:
from moonshine_voice import get_embedding_model
# Download with default variant (q4)
embedding_path, embedding_arch = get_embedding_model(
"embeddinggemma-300m"
)
# Download with specific variant
embedding_path, embedding_arch = get_embedding_model(
"embeddinggemma-300m",
variant="q8" # Options: q4, q8, fp16, fp32, q4f16
)
Embedding Model Variants
| Variant | Size | Quality | Use Case |
|---|
q4 | Smallest | Good | Default, edge devices |
q8 | Small | Better | Balanced |
fp16 | Medium | High | GPU inference |
fp32 | Large | Highest | Research, accuracy-critical |
q4f16 | Medium | High | Mixed precision |
# Command line download
python -c "from moonshine_voice import get_embedding_model; get_embedding_model('embeddinggemma-300m', 'q4')"
Get detailed model information:
from moonshine_voice.download import log_model_info
# Print detailed information
log_model_info(wanted_language="en")
Output:
Model download url: https://download.moonshine.ai/model/medium-streaming-en/quantized
Model components: ['adapter.ort', 'cross_kv.ort', 'decoder_kv.ort', 'encoder.ort', 'frontend.ort', 'streaming_config.json', 'tokenizer.bin']
Model arch: 4
Downloaded model path: /Users/username/Library/Caches/moonshine_voice/...
Checking Available Languages
from moonshine_voice.download import (
supported_languages,
supported_languages_friendly
)
# Get list of language codes
languages = supported_languages()
print(languages) # ['ar', 'es', 'en', 'ja', 'ko', 'vi', 'uk', 'zh']
# Get friendly description
print(supported_languages_friendly())
# Output: ar (Arabic), es (Spanish), en (English), ja (Japanese), ...
Error Handling
try:
model_path, model_arch = get_model_for_language("invalid_lang")
except ValueError as e:
print(f"Error: {e}")
# Handle invalid language
try:
model_path, model_arch = get_model_for_language(
"en",
wanted_model_arch=999 # Invalid architecture
)
except ValueError as e:
print(f"Error: {e}")
# Handle invalid architecture
Model Files Structure
moonshine_voice/
└── download.moonshine.ai/
└── model/
├── base-en/
│ └── quantized/
│ └── base-en/
│ ├── encoder_model.ort
│ ├── decoder_model_merged.ort
│ └── tokenizer.bin
├── medium-streaming-en/
│ └── quantized/
│ ├── encoder.ort
│ ├── adapter.ort
│ ├── cross_kv.ort
│ ├── decoder_kv.ort
│ ├── frontend.ort
│ ├── streaming_config.json
│ └── tokenizer.bin
└── embeddinggemma-300m/
├── model.onnx
├── model.onnx_data
├── model_q4.onnx
├── model_q4.onnx_data
└── tokenizer.bin
Offline Usage
Once models are downloaded, they work offline:
-
Download models while online:
python -m moonshine_voice.download --language en
-
Copy cache directory to offline machine:
# On source machine
tar -czf moonshine_models.tar.gz ~/.cache/moonshine_voice
# On target machine
tar -xzf moonshine_models.tar.gz -C ~/
-
Use normally (no internet needed):
model_path, model_arch = get_model_for_language("en")
Windows
For Windows C++ applications:
# Install Python package first
pip install moonshine-voice
# Download models
python -m moonshine_voice.download --language en
# Note the model path from output
# Use in your C++ application
Raspberry Pi
# Install with system packages flag
sudo pip install --break-system-packages moonshine-voice
# Download model
python -m moonshine_voice.download --language en
On Raspberry Pi, prefer smaller models (Tiny, Tiny-Streaming) for better performance.
Licensing
Non-English models are released under the Moonshine Community License (non-commercial use).English models are available under Apache 2.0.See https://www.moonshine.ai/license for details.
See Also