Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt

Use this file to discover all available pages before exploring further.

The Parakeet engine uses NVIDIA NeMo’s ASRModel to load and run Parakeet models — NVIDIA’s high-accuracy English ASR model family. It is best suited for Linux or WSL2 environments with CUDA available, where NeMo’s dependencies resolve cleanly and GPU acceleration is possible. For CPU INT8 inference without NeMo, see the sherpa-onnx engine.
NeMo has heavy system dependencies and is primarily supported on Linux. Running real Parakeet models on Windows is not recommended — use WSL2 or a Linux environment instead.

Engine Names

  • parakeet
  • nvidia_parakeet

Install

pip install -U "RealtimeSTT[parakeet]"
This installs nemo_toolkit[asr] and soundfile. NeMo ASR requires a compatible CUDA/PyTorch stack. If you need a specific CUDA version, install it manually first:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
NeMo, PyTorch, CUDA, and Python version compatibility is stricter than the default faster-whisper path. If you hit package conflicts, start from a fresh Linux virtual environment.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="parakeet",
    model="nvidia/parakeet-tdt-0.6b-v3",
    device="cuda",
    language="en",
)
The default model is nvidia/parakeet-tdt-0.6b-v3. NeMo handles model downloading and caching for known model IDs automatically via ASRModel.from_pretrained.

CPU Alternative

For CPU-only deployments or systems without NeMo, use the sherpa-onnx Parakeet engine (sherpa_onnx_parakeet). It uses pre-extracted ONNX INT8 model files and runs without any NeMo installation. You can also switch backends through transcription_engine_options while keeping the parakeet engine name:
recorder = AudioToTextRecorder(
    transcription_engine="parakeet",
    transcription_engine_options={
        "backend": "sherpa_onnx",
    },
)

Model Options and Cache

Backend loader options are passed through transcription_engine_options:
recorder = AudioToTextRecorder(
    transcription_engine="parakeet",
    model="nvidia/parakeet-tdt-0.6b-v3",
    transcription_engine_options={
        "model": {},
        "transcribe": {"batch_size": 1},
    },
)

Options Reference

OptionMeaning
transcription_engine_options["model"]Dictionary of kwargs passed to ASRModel.from_pretrained.
transcription_engine_options["transcribe"]Dictionary merged into model.transcribe(...).
transcription_engine_options["sample_rate"]Sample rate used when writing in-memory audio to a temporary WAV file. Defaults to 16000.
transcription_engine_options["timestamps"]Enables or disables timestamp requests when supported by NeMo.
batch_sizePassed to transcribe when greater than 0.

Resource Usage

Parakeet v3 is a large ASR model compared with the default tiny Whisper demo. Expect:
  • Higher memory use — a dedicated GPU with sufficient VRAM is recommended
  • Longer startup time — NeMo downloads and initializes a large model on first run
  • Better accuracy — especially on clean English speech
For multi-user server deployments, prefer a shared-model server lane rather than loading one model per browser session.

FastAPI Server Example

python example_fastapi_server/server.py \
  --engine parakeet \
  --model nvidia/parakeet-tdt-0.6b-v3 \
  --realtime-engine faster_whisper \
  --realtime-model tiny.en \
  --device cuda \
  --language en
Use --use-main-model-for-realtime only when you want a single shared model lane and can accept contention between realtime and final transcription calls.

Troubleshooting

  • nemo.collections.asr import error — install "nemo_toolkit[asr]" in the active environment.
  • soundfile errors — install soundfile and ensure the system libsndfile library is present.
  • Windows package resolution failures — move real Parakeet testing to WSL2 or Linux.
  • Slow startup — NeMo is downloading and initializing a large model; this is normal on first use.

Build docs developers (and LLMs) love