Troubleshoot Common RealtimeSTT Install and Runtime Errors

This page covers common RealtimeSTT setup and runtime issues organized by category. Engine-specific notes live in the individual engine pages under the Engines section.

Installation Issues

PyAudio or PortAudio fails to build

Install PortAudio system packages before running pip install.Linux:

sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev
pip install "RealtimeSTT[faster-whisper]"

macOS:

brew install portaudio
pip install "RealtimeSTT[faster-whisper]"

Windows: Prefer pre-built wheels from PyPI where available. Run pip install from a normal terminal with the intended Python environment active.

Optional engine import fails

RealtimeSTT imports optional engines lazily. If an engine import fails at runtime, install that backend’s package:

Engine	Required package
`whisper_cpp`	`pywhispercpp`
`openai_whisper`	`openai-whisper`
`sherpa_onnx_*`	`sherpa-onnx`
`parakeet`	`nemo_toolkit[asr]`, `soundfile`
`granite_speech`, `moonshine`, `cohere_transcribe`	`transformers`, `torch`
`qwen3_asr`	`qwen-asr`
`omnilingual_asr`	`RealtimeSTT[omnilingual]` on Linux/WSL2 with Python 3.11.x, plus a compatible PyTorch stack
`kroko_onnx`	`RealtimeSTT[kroko-builder,silero-onnx-cpu]`, then `stt-install-kroko --build` for recorder-based smoke tests and live microphone use

Audio and Microphone Issues

No microphone input detected

Confirm the OS granted microphone permission to the terminal or Python application.
Check the default input device in your system’s sound settings.
Pass input_device_index explicitly if the wrong device is being selected.
Run a small PyAudio device-list script or the recorder demo to enumerate available devices.

Input overflow warnings

Overflow warnings mean audio is arriving faster than it is being consumed. Try one or more of the following:

Switch to a smaller or faster model.
Set device="cuda" when a GPU is available.
Increase realtime_processing_pause.
Lower the realtime beam size.
Increase queue or capacity settings in the FastAPI server.

External audio sounds garbled or has recognition errors

When feeding audio via feed_audio(), ensure your chunks are:

16-bit signed PCM bytes
Mono channel
Labeled with the correct original_sample_rate
In the correct byte order

See the external audio documentation for a complete guide.

CUDA and GPU Issues

cuDNN or CUDA library cannot be loaded

Install a PyTorch build that matches your CUDA runtime and driver version. Use the PyTorch install selector for your exact CUDA version. For example, for CUDA 12.1:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

If GPU setup is not required, use device="cpu" with a small model.

GPU memory is exhausted

Use a smaller model.
Use a smaller realtime model than the final model.
Set use_main_model_for_realtime=True to avoid loading two models, accepting some inference contention.
Reduce batch sizes.
Use CPU INT8 engines such as sherpa_onnx_moonshine for CPU-only deployments.

Model Issues

Hugging Face model download fails

Check network access from the machine running the script.
Set download_root to a writable directory.
Authenticate if the model is gated: use huggingface-cli login or set the HF_TOKEN environment variable.
Pre-download models in the same environment that runs the application.

sherpa-onnx model files are missing

sherpa-onnx engines do not download model bundles automatically. Download and extract the required archive, then pass the extracted directory path:

AudioToTextRecorder(
    transcription_engine="sherpa_onnx_moonshine",
    model="test-model-cache/sherpa-onnx/sherpa-onnx-moonshine-tiny-en-int8",
    device="cpu",
    language="en",
)

The error message names the missing file. Confirm you are pointing at the extracted directory, not the .tar.bz2 archive file.

Runtime Behavior

text() never returns

The recorder waits for VAD to detect speech end before returning. If text() hangs:

Feed trailing silence when using external audio via feed_audio().
Lower post_speech_silence_duration for faster turn finalization.
Check that microphone input is being received and VAD sensitivity is tuned appropriately.
Confirm speech duration exceeds min_length_of_recording.

Final transcription is slow

Use a smaller final model.
Enable CUDA with device="cuda".
Reduce beam_size.
Enable early_transcription_on_silence carefully.
For realtime UX, keep final model accuracy high but use a smaller separate realtime model.

Realtime text lags

Use realtime_model_type="tiny.en" or another small realtime model.
Set beam_size_realtime=1.
Increase realtime_processing_pause.
Enable syllable-boundary scheduling and tune follow-up delays.
Use a separate, lighter realtime engine and model if the final model is heavy.

Wake word does not trigger

Confirm wake_words or wakeword_backend is set.
For Porcupine, use one of the supported built-in keyword names.
For OpenWakeWord, pass valid model files and the matching framework.
Tune wake_words_sensitivity.
Test in a quiet room with the microphone close to your mouth first.

Wake word text appears in transcript

Increase wake_word_buffer_duration so more of the wake word audio is excluded from the following recording window.

FastAPI Server Issues

Browser cannot connect to the server

Confirm the server is running and listening on the expected host and port.
Open http://localhost:8010 from the same machine first to rule out a networking issue.
Check that the browser has microphone permission.
Check the /health endpoint for startup errors.

New sessions are rejected

The server has reached the --max-sessions limit. Increase the limit only if the selected engine and hardware can handle the additional load.

Realtime events drop under load

The server intentionally coalesces stale realtime work to preserve final transcription accuracy. Tune these parameters to improve throughput:

--max-realtime-queue-age-ms
--realtime-processing-pause
--max-global-inference-queue-depth
model size and engine choice

Windows Notes

Some multiprocessing and model-loading tests may need to run from a normal terminal rather than a restricted sandbox environment. Parakeet/NeMo and Qwen vLLM are Linux-oriented; use WSL2 for those real-model paths on a Windows workstation.

Get Started

Guides

Transcription Engines

Resources

Troubleshoot Common RealtimeSTT Install and Runtime Errors

Installation Issues

Audio and Microphone Issues

CUDA and GPU Issues

Model Issues

Runtime Behavior

FastAPI Server Issues

Windows Notes

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Installation Issues

​Audio and Microphone Issues

​CUDA and GPU Issues

​Model Issues

​Runtime Behavior

​FastAPI Server Issues

​Windows Notes

Build docs developers (and LLMs) love

Installation Issues

Audio and Microphone Issues

CUDA and GPU Issues

Model Issues

Runtime Behavior

FastAPI Server Issues

Windows Notes