Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

RealtimeSTT separates its tests into two categories: fast unit and contract tests that run without downloading any speech models, and opt-in golden transcription tests that run real models against small audio fixtures. This keeps the default test run fast and CI-friendly while still allowing real-model validation when needed. Audio fixtures live in tests/unit/audio/ and are based on public-domain LJ Speech samples. Manual demos, regression harnesses, and legacy experiments live directly under tests/ and are documented in the Test Scripts section below.
Real-model (golden) tests are opt-in and require actual ASR models, optional package dependencies, and sometimes network access on the first run. Without the relevant environment variables set, those tests are skipped automatically — a result with skipped tests means the fast tests passed and the opt-in tests did not run.

Running Unit Tests

Run all fast unit and contract tests from the repository root using your active virtual environment’s Python executable:
python -m unittest -v \
  tests.unit.test_audio_fixtures \
  tests.unit.test_whisper_cpp_engine \
  tests.unit.test_openai_whisper_engine \
  tests.unit.test_additional_transcription_engines \
  tests.unit.test_cohere_transcribe_engine \
  tests.unit.test_granite_speech_engine \
  tests.unit.test_moonshine_engine \
  tests.unit.test_sherpa_onnx_engine \
  tests.unit.test_kroko_onnx_engine \
  tests.unit.test_omnilingual_asr_engine \
  tests.unit.test_realtime_streaming_transcription \
  tests.unit.test_fastapi_server_protocol \
  tests.unit.test_fastapi_server_multi_user
These tests use mocked runtime objects and do not download models or require GPU drivers.

Opt-in Real-Model Tests

Golden tests download or load real speech models and compare a fixture transcription against expected text. Enable each test group by setting the corresponding environment variable.
1

Set up the model cache directory

Create a test-model-cache/ directory in the repository root. This directory is ignored by Git and can safely hold downloaded local test models:
mkdir test-model-cache
2

Run the faster-whisper golden test

$env:REALTIMESTT_RUN_GOLDEN_TRANSCRIPTION = "1"
$env:REALTIMESTT_TEST_MODEL = "tiny"
$env:REALTIMESTT_TEST_DEVICE = "cpu"
$env:REALTIMESTT_TEST_COMPUTE_TYPE = "int8"

python -m unittest -v tests.unit.test_audio_fixtures.GoldenTranscriptionTests
3

Run the whisper.cpp golden test

pip install "RealtimeSTT[whisper-cpp]"

$env:REALTIMESTT_RUN_WHISPER_CPP = "1"
$env:REALTIMESTT_WHISPER_CPP_MODEL = "tiny.en"
$env:REALTIMESTT_WHISPER_CPP_MODEL_DIR = Join-Path (Get-Location) "test-model-cache\pywhispercpp"

python -m unittest -v tests.unit.test_whisper_cpp_engine.WhisperCppGoldenTranscriptionTests
4

Run the OpenAI Whisper golden test

pip install openai-whisper

$env:REALTIMESTT_RUN_OPENAI_WHISPER = "1"
$env:REALTIMESTT_OPENAI_WHISPER_MODEL = "tiny.en"
$env:REALTIMESTT_OPENAI_WHISPER_DEVICE = "cpu"
$env:REALTIMESTT_OPENAI_WHISPER_COMPUTE_TYPE = "float32"
$env:REALTIMESTT_OPENAI_WHISPER_MODEL_DIR = Join-Path (Get-Location) "test-model-cache\openai-whisper"

python -m unittest -v tests.unit.test_openai_whisper_engine.OpenAIWhisperGoldenTranscriptionTests
5

Run opt-in smoke tests for newer engine families

Enable only the engines you want to validate:
# Set only the variables for the engines you want to test
$env:REALTIMESTT_RUN_PARAKEET = "1"
$env:REALTIMESTT_RUN_COHERE_TRANSCRIBE = "1"
$env:REALTIMESTT_RUN_GRANITE_SPEECH = "1"
$env:REALTIMESTT_RUN_QWEN3_ASR = "1"
$env:REALTIMESTT_RUN_MOONSHINE = "1"
$env:REALTIMESTT_HF_MODEL_DIR = Join-Path (Get-Location) "test-model-cache\hf"

python -m unittest -v tests.unit.test_additional_transcription_engines.AdditionalEngineGoldenTranscriptionTests
These smoke tests require numpy plus each backend’s optional dependencies and model access. Cohere Transcribe currently requires accepting gated Hugging Face access before weights can be downloaded.

sherpa-onnx Golden Tests

The fast sherpa-onnx tests mock the runtime and do not download models. For a real RTF comparison, download and extract the model bundles under test-model-cache\sherpa-onnx, then run the opt-in golden tests:
$env:REALTIMESTT_RUN_SHERPA_ONNX_PARAKEET = "1"
$env:REALTIMESTT_SHERPA_ONNX_PARAKEET_MODEL = Join-Path (Get-Location) "test-model-cache\sherpa-onnx\sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8"
$env:REALTIMESTT_SHERPA_ONNX_NUM_THREADS = "2"

python -m unittest -v tests.unit.test_sherpa_onnx_engine.SherpaOnnxGoldenTranscriptionTests.test_transcribes_fixture_with_real_sherpa_parakeet_backend

Kroko-ONNX Tests

The fast Kroko tests use fake runtime objects and do not install or import Kroko-ONNX. For a real-model Community smoke test:
$env:REALTIMESTT_RUN_KROKO_ONNX = "1"
$env:REALTIMESTT_KROKO_ONNX_MODEL = "test-model-cache\kroko-onnx\Kroko-EN-Community-64-L-Streaming-001.data"
$env:REALTIMESTT_KROKO_ONNX_PROVIDER = "cpu"
$env:REALTIMESTT_KROKO_ONNX_NUM_THREADS = "1"

python -m unittest -v tests.unit.test_kroko_onnx_engine.KrokoOnnxGoldenTranscriptionTests
REALTIMESTT_KROKO_ONNX_KEY, KROKO_ONNX_KEY, or KROKO_KEY can be set for licensed Pro models. Do not store keys in command history, documentation, generated reports, or committed files.

Omnilingual ASR Tests

The fast Omnilingual tests use fake runtime objects and do not install or import Meta’s Omnilingual ASR package. This is a source-checkout command and is not expected to work from a clean pip install unless the source tree is present:
python -m unittest -v tests.unit.test_omnilingual_asr_engine
For real-model validation, run the standalone smoke test script from Linux or WSL2 with Python 3.11.x. See the Omnilingual ASR engine page for the full workflow.

FastAPI Multi-User Load Test

The FastAPI browser server has fast fake-scheduler tests for session isolation, fair scheduling, realtime coalescing, stale realtime discard, admission limits, and clear/reset behavior:
python -m unittest -v tests.unit.test_fastapi_server_protocol tests.unit.test_fastapi_server_multi_user
The opt-in real-engine load and performance test streams tests\unit\audio\asr-reference.wav through multiple parallel sessions, compares final text with expected sentences, checks per-session latency skew, and prints a timing report:
$env:REALTIMESTT_RUN_FASTAPI_MULTI_USER_PERF = "1"
$env:REALTIMESTT_FASTAPI_ASR_CLIENTS = "2"
$env:REALTIMESTT_FASTAPI_ASR_ENGINE = "faster_whisper"
$env:REALTIMESTT_FASTAPI_ASR_MODEL = "small.en"

python -m unittest -v tests.unit.test_fastapi_server_multi_user_asr_integration
Save the report as JSON for comparing runs:
$env:REALTIMESTT_FASTAPI_ASR_METRICS_JSON = "test-results\fastapi-multi-user-perf.json"
For a sherpa-onnx Moonshine CPU run, use the included helper script:
example_fastapi_server\run_multi_user_perf.cmd
Override specific variables before running it:
set REALTIMESTT_FASTAPI_ASR_CLIENTS=8
set REALTIMESTT_FASTAPI_ASR_METRICS_JSON=test-results\fastapi-8-user-perf.json
example_fastapi_server\run_multi_user_perf.cmd

Test Scripts

Manual demos, regression harnesses, and legacy experiments live directly under tests/. Run them from the repository root so relative imports and model paths resolve correctly.

Maintained Regression and Benchmark Harnesses

ScriptPurpose
tests/final_transcription_gap_regression.pyStreams a WAV file while AudioToTextRecorder.text() runs in parallel to reproduce slow final-transcription gaps. Can generate expected JSON and compare CPU output.
tests/realtime_transcription_count_comparison.pyCompares timer-based realtime transcription with syllable-boundary scheduling on deterministic WAV input. Reports realtime model-call counts and validates final text.
tests/realtime_boundary_detector_live_test.pyLightweight live check for the realtime boundary detector.
tests/realtime_boundary_detector_microphone.pyMicrophone visualizer for syllable/speech boundary detection. Useful when tuning boundary sensitivity.
# Final transcription gap regression
python tests/final_transcription_gap_regression.py --mode both

# Realtime scheduling comparison
python tests/realtime_transcription_count_comparison.py --mode both

# Boundary detector visualizer
python tests/realtime_boundary_detector_microphone.py --sensitivity 0.6

Core Demo Scripts

ScriptPurpose
tests/simple_test.pySmallest microphone transcription smoke script.
tests/realtimestt_test.pyRich console demo with realtime transcription, final text, and optional keyboard typing.
tests/realtimestt_test_whispercpp.pywhisper.cpp interactive demo with CPU profiles.
tests/realtimestt_omnilingual_test.pyLinux/WSL2 Omnilingual ASR script with deterministic file smoke, init-only check, and interactive microphone mode.
tests/feed_audio.pyOpens a PyAudio stream manually and feeds chunks through feed_audio() with use_microphone=False.
tests/openwakeword_test.pyOpenWakeWord demo using local sample wake word models.
tests/realtime_loop_test.pyExercises realtime transcription in a loop.
tests/realtimestt_chinese.pyDemonstrates Chinese transcription settings.
tests/vad_test.pyManual VAD behavior check.
# whisper.cpp interactive demo
pip install "RealtimeSTT[whisper-cpp]" rich pyautogui colorama
python tests/realtimestt_test_whispercpp.py --profile balanced

Application Experiments

ScriptPurpose
tests/advanced_talk.pyCombines RealtimeSTT with RealtimeTTS and LLM calls. Requires API keys and TTS dependencies.
tests/minimalistic_talkbot.pySmall talkbot example using speech input and generated responses.
tests/openai_voice_interface.pyVoice interface experiment using OpenAI-compatible client setup.
tests/translator.pySpeech translation workflow experiment.
tests/type_into_textbox.pyTypes recognized text into the focused text box.
tests/recorder_client.pyUses the packaged recorder client/server path.
Treat these as examples, not correctness tests. Check credentials, local model servers, and package imports before running them.
Scripts that use pyautogui, keyboard, or hotkey support can type into the active application. Scripts using real engines may download large models or require CUDA. Microphone scripts require OS audio permissions.

Adding Tests

For new transcription engines, follow this convention: add fast contract tests first, then add opt-in golden tests only after the contract tests are stable. Fast contract tests should cover:
  • Factory selection and lazy import behavior
  • Missing optional dependency error messages
  • Parameter mapping from TranscriptionEngineConfig to the backend binding
  • Audio validation and normalization behavior
  • Conversion from backend segments into TranscriptionResult
Only add a real-model golden test after the fast contract tests are stable, and keep it opt-in with a dedicated environment variable guard.

Windows Notes

Some recorder tests use multiprocessing pipes. On Windows, those tests may need to run from a normal terminal rather than a restricted sandbox. If a golden test fails with a PermissionError while creating multiprocessing queues or pipes, rerun it in a normal terminal with the same environment variables set. The Parakeet/NeMo and Qwen vLLM paths are Linux-oriented. For real-model validation on a Windows workstation, use WSL2 with a CUDA-enabled Linux environment, mount or clone the repository inside the WSL filesystem, and run the same python -m unittest commands from there.

Build docs developers (and LLMs) love