Skip to main content
Hero Light

Voice Interfaces for Everyone

Moonshine Voice is an open source AI toolkit for developers building real-time voice applications. Everything runs on-device with cutting-edge accuracy and ultra-low latency.

Quickstart

Get transcribing in under 2 minutes with Python

Installation

Install for Python, iOS, Android, and more

Python Guide

Complete Python API reference and examples

API Reference

Full API documentation for all platforms

Why Moonshine Voice?

On-Device & Private

Everything runs locally on your device. Fast, private, and no account, credit card, or API keys needed. Your users’ voice data never leaves their device.

Optimized for Live Speech

Built specifically for real-time streaming applications with low latency responses. The framework does work while the user is still talking, delivering sub-200ms response times.

Higher Accuracy Than Whisper

Our Medium Streaming model achieves 6.65% WER on the HuggingFace OpenASR Leaderboard, outperforming Whisper Large V3 (7.44% WER) while using only 245M parameters vs 1.5B.

107ms

MacBook Pro latency

5-10x faster

Than Whisper in live speech

26MB

Smallest model size

Cross-Platform Support

The same library runs everywhere with one consistent API:

Python

pip install moonshine-voice

iOS & MacOS

Swift Package Manager

Android

Maven package

Windows

Visual Studio support

Linux

Native C++ library

Edge Devices

Raspberry Pi, IoT, wearables

Key Features

Supply any length of audio (up to ~30 seconds) and the model only spends compute on that input. No wasted computation on zero-padding like Whisper’s fixed 30-second window.
Models cache input encoding and decoder state for incremental audio addition. This dramatically reduces latency by skipping redundant computation on audio that’s already been processed.
Supports English, Spanish, Mandarin, Japanese, Korean, Vietnamese, Ukrainian, and Arabic. Language-specific models deliver much higher accuracy than multilingual alternatives.
Batteries included with microphone capture, voice activity detection, speech-to-text, speaker identification (diarization), and command recognition - all in one library.
Built-in command recognition using semantic matching. Users can say commands naturally: “Let there be light” triggers “Turn on the lights” with 76% confidence.
High-level APIs with event listeners for line started, text changed, and line completed events. Focus on your application logic, not audio processing details.

Performance Comparison

Moonshine dramatically outperforms Whisper for live speech applications:
ModelWERParametersMacBook ProLinux x86R. Pi 5
Moonshine Medium Streaming6.65%245M107ms269ms802ms
Whisper Large v37.44%1.5B11,286ms16,919msN/A
Moonshine Small Streaming7.84%123M73ms165ms527ms
Whisper Small8.59%244M1940ms3,425ms10,397ms
Moonshine Tiny Streaming12.00%34M34ms69ms237ms
Whisper Tiny12.81%39M277ms1,141ms5,863ms
Moonshine achieves 5-40x lower latency than Whisper while maintaining competitive or superior accuracy. This makes it ideal for interactive voice interfaces where responsiveness is critical.

Research Foundation

Moonshine Voice is based on cutting-edge research from the Moonshine AI team:

Get Started

Ready to add voice to your application? Start with our quickstart guide:

Quickstart Guide

Transcribe audio in under 2 minutes

Community & Support

Join Discord

Get live support from the community

GitHub Issues

Report bugs and request features

Build docs developers (and LLMs) love