This page covers common RealtimeSTT setup and runtime issues organized by category. Engine-specific notes live in the individual engine pages under the Engines section.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
Installation Issues
PyAudio or PortAudio fails to build
PyAudio or PortAudio fails to build
pip install.Linux:pip install from a normal terminal with the intended Python environment active.Optional engine import fails
Optional engine import fails
| Engine | Required package |
|---|---|
whisper_cpp | pywhispercpp |
openai_whisper | openai-whisper |
sherpa_onnx_* | sherpa-onnx |
parakeet | nemo_toolkit[asr], soundfile |
granite_speech, moonshine, cohere_transcribe | transformers, torch |
qwen3_asr | qwen-asr |
omnilingual_asr | RealtimeSTT[omnilingual] on Linux/WSL2 with Python 3.11.x, plus a compatible PyTorch stack |
kroko_onnx | RealtimeSTT[kroko-builder,silero-onnx-cpu], then stt-install-kroko --build for recorder-based smoke tests and live microphone use |
Audio and Microphone Issues
No microphone input detected
No microphone input detected
- Confirm the OS granted microphone permission to the terminal or Python application.
- Check the default input device in your system’s sound settings.
- Pass
input_device_indexexplicitly if the wrong device is being selected. - Run a small PyAudio device-list script or the recorder demo to enumerate available devices.
Input overflow warnings
Input overflow warnings
- Switch to a smaller or faster model.
- Set
device="cuda"when a GPU is available. - Increase
realtime_processing_pause. - Lower the realtime beam size.
- Increase queue or capacity settings in the FastAPI server.
External audio sounds garbled or has recognition errors
External audio sounds garbled or has recognition errors
feed_audio(), ensure your chunks are:- 16-bit signed PCM bytes
- Mono channel
- Labeled with the correct
original_sample_rate - In the correct byte order
CUDA and GPU Issues
cuDNN or CUDA library cannot be loaded
cuDNN or CUDA library cannot be loaded
device="cpu" with a small model.GPU memory is exhausted
GPU memory is exhausted
- Use a smaller model.
- Use a smaller realtime model than the final model.
- Set
use_main_model_for_realtime=Trueto avoid loading two models, accepting some inference contention. - Reduce batch sizes.
- Use CPU INT8 engines such as
sherpa_onnx_moonshinefor CPU-only deployments.
Model Issues
Hugging Face model download fails
Hugging Face model download fails
- Check network access from the machine running the script.
- Set
download_rootto a writable directory. - Authenticate if the model is gated: use
huggingface-cli loginor set theHF_TOKENenvironment variable. - Pre-download models in the same environment that runs the application.
sherpa-onnx model files are missing
sherpa-onnx model files are missing
.tar.bz2 archive file.Runtime Behavior
text() never returns
text() never returns
text() hangs:- Feed trailing silence when using external audio via
feed_audio(). - Lower
post_speech_silence_durationfor faster turn finalization. - Check that microphone input is being received and VAD sensitivity is tuned appropriately.
- Confirm speech duration exceeds
min_length_of_recording.
Final transcription is slow
Final transcription is slow
- Use a smaller final model.
- Enable CUDA with
device="cuda". - Reduce
beam_size. - Enable
early_transcription_on_silencecarefully. - For realtime UX, keep final model accuracy high but use a smaller separate realtime model.
Realtime text lags
Realtime text lags
- Use
realtime_model_type="tiny.en"or another small realtime model. - Set
beam_size_realtime=1. - Increase
realtime_processing_pause. - Enable syllable-boundary scheduling and tune follow-up delays.
- Use a separate, lighter realtime engine and model if the final model is heavy.
Wake word does not trigger
Wake word does not trigger
- Confirm
wake_wordsorwakeword_backendis set. - For Porcupine, use one of the supported built-in keyword names.
- For OpenWakeWord, pass valid model files and the matching framework.
- Tune
wake_words_sensitivity. - Test in a quiet room with the microphone close to your mouth first.
Wake word text appears in transcript
Wake word text appears in transcript
wake_word_buffer_duration so more of the wake word audio is excluded from the following recording window.FastAPI Server Issues
Browser cannot connect to the server
Browser cannot connect to the server
- Confirm the server is running and listening on the expected host and port.
- Open
http://localhost:8010from the same machine first to rule out a networking issue. - Check that the browser has microphone permission.
- Check the
/healthendpoint for startup errors.
New sessions are rejected
New sessions are rejected
--max-sessions limit. Increase the limit only if the selected engine and hardware can handle the additional load.Realtime events drop under load
Realtime events drop under load
--max-realtime-queue-age-ms--realtime-processing-pause--max-global-inference-queue-depth- model size and engine choice
