Local Models

Overview

OpenWhispr supports two local transcription engines for completely private, offline speech-to-text:

Whisper (via whisper.cpp) - Multiple model sizes with quality/speed tradeoffs
Parakeet (via sherpa-onnx) - NVIDIA’s multilingual ASR model with fast CPU inference

Local models keep your voice data completely private - audio never leaves your device.

Whisper Models

Whisper models use the GGML format and are downloaded from HuggingFace.

Available Whisper Models

Size: ~75MBSpeed: FastestQuality: Lower accuracy, good for simple dictationRecommended for: Quick notes, low-end hardwareFile: ggml-tiny.bin

Size: ~142MBSpeed: FastQuality: Good balance of accuracy and speedRecommended for: General use (recommended)File: ggml-base.bin

This is the recommended model for most users. It provides excellent accuracy while remaining fast.

Size: ~466MBSpeed: ModerateQuality: Better accuracy, slower processingRecommended for: Professional transcriptionFile: ggml-small.bin

Size: ~1.5GBSpeed: SlowQuality: High accuracyRecommended for: High-quality transcription needsFile: ggml-medium.bin

Size: ~3GBSpeed: SlowestQuality: Best quality, most accurateRecommended for: Maximum accuracy, powerful hardwareFile: ggml-large-v3.bin

Size: ~1.6GBSpeed: Fast with good qualityQuality: Near-large quality with better speedRecommended for: Balance of quality and performanceFile: ggml-large-v3-turbo.bin

Model Characteristics

Model	Size	Quality	Speed	Use Case
tiny	75MB	⭐⭐	⚡⚡⚡⚡⚡	Quick notes
base	142MB	⭐⭐⭐	⚡⚡⚡⚡	General use
small	466MB	⭐⭐⭐⭐	⚡⚡⚡	Professional
medium	1.5GB	⭐⭐⭐⭐	⚡⚡	High quality
large	3GB	⭐⭐⭐⭐⭐	⚡	Best quality
turbo	1.6GB	⭐⭐⭐⭐⭐	⚡⚡⚡	Quality + speed

NVIDIA Parakeet Models

Parakeet models use ONNX format with INT8 quantization for efficient CPU inference.

Available Parakeet Models

parakeet-tdt-0.6b-v3

Size: ~680MBLanguages: 25 languages with auto-detectionSpeed: Very fast on CPUQuality: Comparable to Whisper small/baseSupported Languages:

Bulgarian, Croatian, Czech, Danish, Dutch
English, Estonian, Finnish, French, German
Greek, Hungarian, Italian, Latvian, Lithuanian
Maltese, Polish, Portuguese, Romanian, Slovak
Slovenian, Spanish, Swedish, Russian, Ukrainian

Runtime: sherpa-onnx (bundled with app)Model Files:

encoder.int8.onnx
decoder.int8.onnx
joiner.int8.onnx
tokens.txt

Parakeet is ideal for multilingual users or those who want fast transcription without GPU acceleration.

Downloading Models Through the UI

Open Control Panel

Right-click the OpenWhispr tray icon and select Control Panel.

Navigate to Settings

Go to Settings → Transcription.

Select Local Processing

Choose Local Processing from the transcription mode options.

Choose Your Engine

Select either:

Whisper (whisper.cpp)
NVIDIA Parakeet (sherpa-onnx)

Select and Download Model

For Whisper:

Choose a model size (tiny, base, small, medium, large, turbo)
Click Download
Wait for download to complete (progress bar shows status)

For Parakeet:

Select parakeet-tdt-0.6b-v3
Click Download
Wait for download and extraction

Start Using

Once downloaded, the model is immediately available for transcription.

Large models (medium, large, turbo) require significant disk space. Ensure you have enough free space before downloading.

Storage Locations

Models are stored in the OpenWhispr cache directory:

macOS
Windows
Linux

Whisper models:

~/.cache/openwhispr/whisper-models/

Parakeet models:

~/.cache/openwhispr/parakeet-models/

Whisper models:

%USERPROFILE%\.cache\openwhispr\whisper-models\

Parakeet models:

%USERPROFILE%\.cache\openwhispr\parakeet-models\

Whisper models:

~/.cache/openwhispr/whisper-models/

Parakeet models:

~/.cache/openwhispr/parakeet-models/

Model Directory Structure

Whisper:

~/.cache/openwhispr/whisper-models/
├── ggml-tiny.bin
├── ggml-base.bin
├── ggml-small.bin
├── ggml-medium.bin
├── ggml-large-v3.bin
└── ggml-large-v3-turbo.bin

Parakeet:

~/.cache/openwhispr/parakeet-models/
└── parakeet-tdt-0.6b-v3/
    ├── encoder.int8.onnx
    ├── decoder.int8.onnx
    ├── joiner.int8.onnx
    └── tokens.txt

Model Cleanup and Disk Space Management

Deleting Individual Models

Open Model Management

Go to Control Panel → Settings → Transcription → Manage Local Models.

Select Model to Delete

Click the Delete button next to the model you want to remove.

Confirm Deletion

Confirm the deletion to free up disk space.The UI will show how much space you’ll reclaim.

Removing All Models

Go to Storage Settings

Navigate to Settings → General → Local Model Storage.

Click Remove All Models

Click the Remove Downloaded Models button.

Confirm Bulk Deletion

Confirm to delete all Whisper and Parakeet models at once.

This action cannot be undone. You’ll need to re-download models to use local transcription again.

Manual Cleanup

You can also manually delete model files from the cache directory:

macOS/Linux
Windows

# Remove all Whisper models
rm -rf ~/.cache/openwhispr/whisper-models/

# Remove all Parakeet models
rm -rf ~/.cache/openwhispr/parakeet-models/

# Remove specific Whisper model
rm ~/.cache/openwhispr/whisper-models/ggml-large-v3.bin

# Remove all Whisper models
Remove-Item -Recurse -Force $env:USERPROFILE\.cache\openwhispr\whisper-models

# Remove all Parakeet models
Remove-Item -Recurse -Force $env:USERPROFILE\.cache\openwhispr\parakeet-models

# Remove specific Whisper model
Remove-Item $env:USERPROFILE\.cache\openwhispr\whisper-models\ggml-large-v3.bin

Choosing Between Whisper and Parakeet

Use Whisper When:

You need multiple model sizes to balance quality and speed
You’re transcribing English primarily
You want the best possible accuracy (large/turbo models)
You have a GPU and want CUDA acceleration (if enabled)

Use Parakeet When:

You need multilingual support (25 languages)
You want fast CPU inference without GPU
You’re on lower-end hardware
You prefer a single, well-optimized model

Performance Optimization

Hardware Acceleration

Whisper CUDA Support (Experimental): If you have an NVIDIA GPU, you can enable CUDA acceleration:

# In .env file
WHISPER_CUDA_ENABLED=true

CUDA support requires compatible NVIDIA drivers and CUDA toolkit installation.

Parakeet CPU Optimization: Parakeet models are optimized for CPU inference using INT8 quantization. No additional configuration needed.

Pre-warming Models

OpenWhispr pre-warms models on startup to eliminate cold-start delays:

Whisper: Server starts when you select a model
Parakeet: Server pre-warms if LOCAL_TRANSCRIPTION_PROVIDER=nvidia

Configuration:

# .env file
LOCAL_TRANSCRIPTION_PROVIDER=nvidia  # or "whisper"
PARAKEET_MODEL=parakeet-tdt-0.6b-v3

Troubleshooting

Download Failed or Incomplete

Symptoms: Download stops partway through or model doesn’t workSolutions:

Check your internet connection
Ensure you have enough disk space (model size × 2.5 for extraction)
Try downloading again - OpenWhispr resumes interrupted downloads
Check the logs for specific error messages

Manually delete partial files and retry:

# Look for .tmp or incomplete files
ls -la ~/.cache/openwhispr/whisper-models/

Model Not Found Error

Symptoms: Error saying model isn’t downloaded when it should beSolutions:

Verify the model file exists in the cache directory
Check file permissions (should be readable by your user)
For Parakeet, ensure all 4 required files exist:
- encoder.int8.onnx
- decoder.int8.onnx
- joiner.int8.onnx
- tokens.txt
Try re-downloading the model

Transcription Slow or Hangs

Symptoms: Transcription takes a long time or appears to freezeSolutions:

For Whisper: Try a smaller model (base instead of large)
For Parakeet: Ensure sherpa-onnx binary is accessible
Check CPU usage - high background activity may slow processing
For very long recordings, consider splitting into shorter clips
Check the logs for server timeout errors

Out of Disk Space

Symptoms: Download fails with disk space errorSolutions:

Free up disk space (models need 2.5× their size for extraction)
Use the model cleanup feature to remove unused models
Choose a smaller model:
- Try base instead of large (save ~2.8GB)
- Use Parakeet instead of multiple Whisper models

Model Registry Data

All model metadata is defined in src/models/modelRegistryData.json:

{
  "whisperModels": {
    "base": {
      "name": "Base",
      "description": "Good balance",
      "size": "142MB",
      "sizeMb": 142,
      "expectedSizeBytes": 148000000,
      "fileName": "ggml-base.bin",
      "downloadUrl": "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin",
      "recommended": true
    }
  },
  "parakeetModels": {
    "parakeet-tdt-0.6b-v3": {
      "name": "Parakeet TDT 0.6B",
      "description": "Fast multilingual ASR with auto language detection (25 languages)",
      "size": "680MB",
      "downloadUrl": "https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8.tar.bz2",
      "extractDir": "sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8"
    }
  }
}

This centralized registry ensures consistency across the app and makes it easy to add new models in the future.

Get Started

Core Features

Configuration

Platform Guides

Advanced

Overview

Whisper Models

Available Whisper Models

Model Characteristics

NVIDIA Parakeet Models

Available Parakeet Models

Downloading Models Through the UI

Storage Locations

Model Directory Structure

Model Cleanup and Disk Space Management

Deleting Individual Models

Removing All Models

Manual Cleanup

Choosing Between Whisper and Parakeet

Use Whisper When:

Use Parakeet When:

Performance Optimization

Hardware Acceleration

Pre-warming Models

Troubleshooting

Model Registry Data

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Platform Guides

Advanced

​Overview

​Whisper Models

​Available Whisper Models

​Model Characteristics

​NVIDIA Parakeet Models

​Available Parakeet Models

​Downloading Models Through the UI

​Storage Locations

​Model Directory Structure

​Model Cleanup and Disk Space Management

​Deleting Individual Models

​Removing All Models

​Manual Cleanup

​Choosing Between Whisper and Parakeet

​Use Whisper When:

​Use Parakeet When:

​Performance Optimization

​Hardware Acceleration

​Pre-warming Models

​Troubleshooting

​Model Registry Data

Build docs developers (and LLMs) love

Overview

Whisper Models

Available Whisper Models

Model Characteristics

NVIDIA Parakeet Models

Available Parakeet Models

Downloading Models Through the UI

Storage Locations

Model Directory Structure

Model Cleanup and Disk Space Management

Deleting Individual Models

Removing All Models

Manual Cleanup

Choosing Between Whisper and Parakeet

Use Whisper When:

Use Parakeet When:

Performance Optimization

Hardware Acceleration

Pre-warming Models

Troubleshooting

Model Registry Data