TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
example_fastapi_server directory in the RealtimeSTT repository contains a
reference FastAPI application that streams microphone audio from a browser
into per-session RealtimeSTT recorder instances. It serves a polished browser
UI and exposes a WebSocket endpoint that handles concurrent sessions, shared
inference workers, health checks, and operational metrics.
Installation
Install a transcription engine
For pip-only installs without cloning the repository, use the Python
recorder examples in the
examples/ directory instead. The FastAPI server
is intentionally kept source-only to avoid adding web-server dependencies
to the core wheel.Starting the Server
Engine Selection
Pass--engine and --model (plus --realtime-engine and --realtime-model
for interim transcription) to select a different backend.
- faster-whisper (default)
- whisper.cpp (CPU)
- sherpa-onnx Moonshine (CPU)
- Parakeet (CUDA)
--use-main-model-for-realtime to share a single inference lane for both
final and realtime work, reducing GPU memory usage:
Multi-User Session Isolation
The server is designed for concurrent browser clients. Each WebSocket connection receives a uniquesessionId and owns its own lightweight state machine:
- Per-session: audio buffer, VAD state (WebRTC + Silero), transcript segment IDs, realtime text, final text, warnings, and error state.
- Shared: heavy ASR inference workers — one final model lane and one realtime
model lane (or a single shared lane with
--use-main-model-for-realtime).
--max-sessions is reached, new WebSocket clients receive an admission
error and close code 1013. When active speaker capacity is reached, accepted
sessions receive a warning while existing final work is preserved.
WebSocket Protocol
The browser sends binary audio packets to/ws/transcribe with the following
layout:
start, stop, clear, ping, metrics.
Server event types:
| Event type | Description |
|---|---|
hello | Assigns clientId and sessionId to the new connection. |
ready | Model lanes are initialized and the session can begin streaming. |
timeline | Segment timing and wake word state transitions. |
realtime | Interim transcript text for a session-local segmentId. |
final | Final transcript text for the same segmentId; replaces the interim block. |
status | Session or server state update. |
warning | Recoverable issue (e.g., approaching capacity limits). |
error | Command, packet, admission, or runtime error. |
clear | Session transcript reset acknowledgement. |
pong | Response to a ping command. |
metrics | Per-session metrics in response to a metrics command. |
realtime, final) include sessionId and are
routed only to the session that produced them. They may also include a segment
object containing recording start/end timestamps, duration, pre-recording buffer
range, and wake word timing when available.
Health and Metrics
Use the/health endpoint for readiness checks and basic load information:
/api/metrics for detailed operational data including queue depth, latency
percentiles, coalescing counters, drop counters, and worker utilization:
| Endpoint | Method | Description |
|---|---|---|
/ | GET | Browser UI. |
/health | GET | Readiness, active sessions/speakers, startup errors, scheduler state. |
/api/config | GET | Public settings, limits, supported engines, and runtime settings contract. |
/api/config | PATCH | Update runtime-safe settings without restarting. |
/api/metrics | GET | Counters, queue depth, p50/p95 latency, coalescing, drops, worker busy ratio. |
/ws/transcribe | WebSocket | Browser audio stream and command channel. |
GET /api/config response separates settings into activeSessionSafe,
newSessionOnly, and startupOnly buckets. Engine and model paths are
startupOnly — changing them after startup is rejected because shared inference
workers are already initialized.
Wake Word Mode
Pass wake word flags to enable wake word activation for all browser sessions:timeline WebSocket events and visualised in the browser UI.
