The Parakeet engine uses NVIDIA NeMo’sDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/KoljaB/RealtimeSTT/llms.txt
Use this file to discover all available pages before exploring further.
ASRModel to load and run Parakeet models — NVIDIA’s high-accuracy English ASR model family. It is best suited for Linux or WSL2 environments with CUDA available, where NeMo’s dependencies resolve cleanly and GPU acceleration is possible. For CPU INT8 inference without NeMo, see the sherpa-onnx engine.
Engine Names
parakeetnvidia_parakeet
Install
nemo_toolkit[asr] and soundfile. NeMo ASR requires a compatible CUDA/PyTorch stack. If you need a specific CUDA version, install it manually first:
Basic Usage
nvidia/parakeet-tdt-0.6b-v3. NeMo handles model downloading and caching for known model IDs automatically via ASRModel.from_pretrained.
CPU Alternative
For CPU-only deployments or systems without NeMo, use the sherpa-onnx Parakeet engine (sherpa_onnx_parakeet). It uses pre-extracted ONNX INT8 model files and runs without any NeMo installation.
You can also switch backends through transcription_engine_options while keeping the parakeet engine name:
Model Options and Cache
Backend loader options are passed throughtranscription_engine_options:
Options Reference
| Option | Meaning |
|---|---|
transcription_engine_options["model"] | Dictionary of kwargs passed to ASRModel.from_pretrained. |
transcription_engine_options["transcribe"] | Dictionary merged into model.transcribe(...). |
transcription_engine_options["sample_rate"] | Sample rate used when writing in-memory audio to a temporary WAV file. Defaults to 16000. |
transcription_engine_options["timestamps"] | Enables or disables timestamp requests when supported by NeMo. |
batch_size | Passed to transcribe when greater than 0. |
Resource Usage
Parakeet v3 is a large ASR model compared with the default tiny Whisper demo. Expect:- Higher memory use — a dedicated GPU with sufficient VRAM is recommended
- Longer startup time — NeMo downloads and initializes a large model on first run
- Better accuracy — especially on clean English speech
FastAPI Server Example
--use-main-model-for-realtime only when you want a single shared model lane and can accept contention between realtime and final transcription calls.
Troubleshooting
nemo.collections.asrimport error — install"nemo_toolkit[asr]"in the active environment.soundfileerrors — installsoundfileand ensure the systemlibsndfilelibrary is present.- Windows package resolution failures — move real Parakeet testing to WSL2 or Linux.
- Slow startup — NeMo is downloading and initializing a large model; this is normal on first use.
