NVIDIA Parakeet NeMo Speech Engine for RealtimeSTT

The Parakeet engine uses NVIDIA NeMo’s ASRModel to load and run Parakeet models — NVIDIA’s high-accuracy English ASR model family. It is best suited for Linux or WSL2 environments with CUDA available, where NeMo’s dependencies resolve cleanly and GPU acceleration is possible. For CPU INT8 inference without NeMo, see the sherpa-onnx engine.

NeMo has heavy system dependencies and is primarily supported on Linux. Running real Parakeet models on Windows is not recommended — use WSL2 or a Linux environment instead.

Engine Names

parakeet
nvidia_parakeet

Install

pip install -U "RealtimeSTT[parakeet]"

This installs nemo_toolkit[asr] and soundfile. NeMo ASR requires a compatible CUDA/PyTorch stack. If you need a specific CUDA version, install it manually first:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

NeMo, PyTorch, CUDA, and Python version compatibility is stricter than the default faster-whisper path. If you hit package conflicts, start from a fresh Linux virtual environment.

Basic Usage

from RealtimeSTT import AudioToTextRecorder

recorder = AudioToTextRecorder(
    transcription_engine="parakeet",
    model="nvidia/parakeet-tdt-0.6b-v3",
    device="cuda",
    language="en",
)

The default model is nvidia/parakeet-tdt-0.6b-v3. NeMo handles model downloading and caching for known model IDs automatically via ASRModel.from_pretrained.

CPU Alternative

For CPU-only deployments or systems without NeMo, use the sherpa-onnx Parakeet engine (sherpa_onnx_parakeet). It uses pre-extracted ONNX INT8 model files and runs without any NeMo installation. You can also switch backends through transcription_engine_options while keeping the parakeet engine name:

recorder = AudioToTextRecorder(
    transcription_engine="parakeet",
    transcription_engine_options={
        "backend": "sherpa_onnx",
    },
)

Model Options and Cache

Backend loader options are passed through transcription_engine_options:

recorder = AudioToTextRecorder(
    transcription_engine="parakeet",
    model="nvidia/parakeet-tdt-0.6b-v3",
    transcription_engine_options={
        "model": {},
        "transcribe": {"batch_size": 1},
    },
)

Options Reference

Option	Meaning
`transcription_engine_options["model"]`	Dictionary of kwargs passed to `ASRModel.from_pretrained`.
`transcription_engine_options["transcribe"]`	Dictionary merged into `model.transcribe(...)`.
`transcription_engine_options["sample_rate"]`	Sample rate used when writing in-memory audio to a temporary WAV file. Defaults to `16000`.
`transcription_engine_options["timestamps"]`	Enables or disables timestamp requests when supported by NeMo.
`batch_size`	Passed to `transcribe` when greater than `0`.

Resource Usage

Parakeet v3 is a large ASR model compared with the default tiny Whisper demo. Expect:

Higher memory use — a dedicated GPU with sufficient VRAM is recommended
Longer startup time — NeMo downloads and initializes a large model on first run
Better accuracy — especially on clean English speech

For multi-user server deployments, prefer a shared-model server lane rather than loading one model per browser session.

FastAPI Server Example

python example_fastapi_server/server.py \
  --engine parakeet \
  --model nvidia/parakeet-tdt-0.6b-v3 \
  --realtime-engine faster_whisper \
  --realtime-model tiny.en \
  --device cuda \
  --language en

Use --use-main-model-for-realtime only when you want a single shared model lane and can accept contention between realtime and final transcription calls.

Troubleshooting

nemo.collections.asr import error — install "nemo_toolkit[asr]" in the active environment.
soundfile errors — install soundfile and ensure the system libsndfile library is present.
Windows package resolution failures — move real Parakeet testing to WSL2 or Linux.
Slow startup — NeMo is downloading and initializing a large model; this is normal on first use.

Get Started

Guides

Transcription Engines

Resources

NVIDIA Parakeet NeMo Speech Engine for RealtimeSTT

Engine Names

Install

Basic Usage

CPU Alternative

Model Options and Cache

Options Reference

Resource Usage

FastAPI Server Example

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Guides

Transcription Engines

Resources

Documentation Index

​Engine Names

​Install

​Basic Usage

​CPU Alternative

​Model Options and Cache

​Options Reference

​Resource Usage

​FastAPI Server Example

​Troubleshooting

Build docs developers (and LLMs) love

Engine Names

Install

Basic Usage

CPU Alternative

Model Options and Cache

Options Reference

Resource Usage

FastAPI Server Example

Troubleshooting