Skip to main content
OpenWhispr is built on Electron 36 with a privacy-first, dual-window architecture designed for efficient local and cloud speech-to-text transcription.

Core Technologies

Frontend

React 19, TypeScript, Tailwind CSS v4, Vite

Desktop Framework

Electron 36 with context isolation

Database

better-sqlite3 for local history

Speech Processing

whisper.cpp + NVIDIA Parakeet + OpenAI API

Dual Window System

OpenWhispr uses two independent windows for optimal UX:
1

Main Window (Dictation Overlay)

Minimal, always-on-top floating panel for recording control
  • Draggable: Positioned anywhere on screen
  • Always on top: Never obscured by other windows
  • Frameless: Minimal chrome for distraction-free use
  • Click-through: Main process handles interactivity via IPC
2

Control Panel Window

Full-featured settings and management interface
  • Normal window: Standard maximize/minimize/close
  • Settings UI: API keys, model selection, hotkeys
  • History: SQLite-backed transcription archive
  • Model management: Download/delete Whisper models
Both windows use the same React codebase with URL-based routing (?panel=control vs. ?panel=main). This reduces bundle size and maintains consistency.

Process Architecture

Main Process (main.js)

The Electron main process handles:
  • Window lifecycle: Create, show, hide, destroy windows
  • IPC handlers: 100+ secure channels (see src/helpers/ipcHandlers.js)
  • Database operations: SQLite reads/writes via better-sqlite3
  • Native integrations: Swift (macOS Globe key), C (Windows key listener)
  • Global hotkeys: Cross-platform keyboard shortcut registration
Manager Pattern: All main process logic is organized into manager classes:
// Initialized in startApp() after app.whenReady()
const managers = {
  environment: new EnvironmentManager(),      // .env file management
  window: new WindowManager(),                // Window creation/positioning
  database: new DatabaseManager(),            // SQLite operations
  clipboard: new ClipboardManager(),          // Cross-platform paste
  whisper: new WhisperManager(),              // Local whisper.cpp
  parakeet: new ParakeetManager(),            // NVIDIA Parakeet ASR
  tray: new TrayManager(),                    // System tray icon/menu
  update: new UpdateManager(),                // Auto-update (electron-updater)
  hotkey: new HotkeyManager(),                // Global shortcuts
  globeKey: new GlobeKeyManager(),            // macOS Fn/Globe key
  windowsKey: new WindowsKeyManager(),        // Windows push-to-talk
};

Renderer Process (React)

The renderer runs the UI with context isolation enabled:
  • No direct Node.js access: Security boundary enforced
  • IPC via contextBridge: window.electronAPI only
  • Separate Vite build: Fast HMR during development
  • Shared components: shadcn/ui + Radix primitives

Preload Script (preload.js)

Secure bridge between main and renderer:
// Exposes safe IPC methods
contextBridge.exposeInMainWorld('electronAPI', {
  // Database
  saveTranscription: (text) => ipcRenderer.invoke('db-save-transcription', text),
  getTranscriptions: (limit) => ipcRenderer.invoke('db-get-transcriptions', limit),
  
  // Whisper
  transcribeLocalWhisper: (audioBlob, options) => 
    ipcRenderer.invoke('transcribe-local-whisper', audioBlob, options),
  downloadWhisperModel: (modelName) => 
    ipcRenderer.invoke('download-whisper-model', modelName),
  
  // Event listeners (with cleanup)
  onToggleDictation: (callback) => {
    const listener = () => callback();
    ipcRenderer.on('toggle-dictation', listener);
    return () => ipcRenderer.removeListener('toggle-dictation', listener);
  },
});
Never expose ipcRenderer directly. Always use invoke() for async calls and register listeners with cleanup functions to prevent memory leaks.

Audio Pipeline

The complete flow from microphone to text:
1

Recording (Renderer)

MediaRecorder API captures audio in WebM format
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const recorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
recorder.ondataavailable = (e) => chunks.push(e.data);
2

Blob Creation (Renderer)

When recording stops, chunks are combined into a Blob:
const blob = new Blob(chunks, { type: 'audio/webm' });
const arrayBuffer = await blob.arrayBuffer();
3

IPC Transfer (Renderer → Main)

ArrayBuffer is sent via IPC (max 10MB):
const result = await window.electronAPI.transcribeLocalWhisper(
  arrayBuffer,
  { model: 'base', language: 'en' }
);
4

Temporary File (Main)

Main process writes to temp directory:
const tempFile = path.join(tmpdir(), `whisper-${Date.now()}.webm`);
fs.writeFileSync(tempFile, Buffer.from(arrayBuffer));
5

FFmpeg Conversion (Main)

ffmpeg-static converts WebM → WAV (16kHz mono):
ffmpeg -i input.webm -ar 16000 -ac 1 -f wav output.wav
FFmpeg is bundled and unpacked from ASAR to app.asar.unpacked/node_modules/ffmpeg-static/
6

whisper.cpp Transcription (Main)

Native binary processes WAV file:
./whisper-cpp -m models/ggml-base.bin -f audio.wav -l en
Output is parsed from stdout and returned via IPC.
7

Cleanup (Main)

Temporary files are deleted:
fs.unlinkSync(tempFile);
fs.unlinkSync(wavFile);
8

Display (Renderer)

Transcription text is shown and optionally pasted at cursor.
Performance: The entire pipeline completes in 1-5 seconds for 30-second recordings using the base model.

Database Schema

SQLite database at ~/.config/OpenWhispr/transcriptions.db (Linux/macOS) or %APPDATA%/OpenWhispr/transcriptions.db (Windows):
CREATE TABLE transcriptions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
  original_text TEXT NOT NULL,
  processed_text TEXT,
  is_processed BOOLEAN DEFAULT 0,
  processing_method TEXT DEFAULT 'none',
  agent_name TEXT,
  error TEXT
);

CREATE INDEX idx_timestamp ON transcriptions(timestamp DESC);
Key features:
  • Automatic timestamps: No manual date handling
  • AI processing tracking: is_processed, processing_method, agent_name
  • Error logging: Failed transcriptions stored with error messages

Native Integrations

macOS Globe Key Listener

File: resources/globe-listener.swift
// Detects Fn/Globe key presses using IOKit
let eventMask = (1 << kCGEventFlagsChanged)
let eventTap = CGEvent.tapCreate(
  tap: .cgSessionEventTap,
  place: .headInsertEventTap,
  options: .defaultTap,
  eventsOfInterest: CGEventMask(eventMask),
  callback: eventCallback,
  userInfo: nil
)
Compiled at build: scripts/build-globe-listener.js
Requires Xcode Command Line Tools (xcode-select --install) to compile Swift source.

Windows Push-to-Talk Listener

File: resources/windows-key-listener.c
// Low-level keyboard hook for compound hotkeys
HHOOK hook = SetWindowsHookEx(
  WH_KEYBOARD_LL,
  KeyboardProc,
  NULL,
  0
);

LRESULT CALLBACK KeyboardProc(int nCode, WPARAM wParam, LPARAM lParam) {
  if (wParam == WM_KEYDOWN) {
    printf("KEY_DOWN\n");
    fflush(stdout);
  }
  return CallNextHookEx(NULL, nCode, wParam, lParam);
}
Distribution: Prebuilt binary downloaded from GitHub releases

Linux Fast Paste (X11/Wayland)

File: resources/linux-fast-paste.c
// XTest extension for X11
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_Control_L), True, 0);
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_v), True, 0);
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_v), False, 0);
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_Control_L), False, 0);
XFlush(display);
Fallback chain: native binary → wtype → ydotool → xdotool → manual paste

IPC Communication Patterns

Invoke (Request/Response)

For operations returning data:
// Renderer
const result = await window.electronAPI.getTranscriptions(100);

// Main (ipcHandlers.js)
ipcMain.handle('db-get-transcriptions', async (event, limit) => {
  return databaseManager.getTranscriptions(limit);
});

Send (Fire-and-Forget)

For notifications:
// Renderer
window.electronAPI.notifyActivationModeChanged('push');

// Main
ipcMain.on('activation-mode-changed', (event, mode) => {
  windowManager.setActivationModeCache(mode);
});

Events (Main → Renderer)

For real-time updates:
// Renderer (with cleanup)
const cleanup = window.electronAPI.onUpdateAvailable((info) => {
  console.log('Update available:', info.version);
});

// Main
windowManager.controlPanelWindow.webContents.send('update-available', {
  version: '1.5.5',
  releaseNotes: '...',
});
Memory leaks: Always return cleanup functions from event listeners and call them in React’s useEffect cleanup.

Platform-Specific Considerations

  • Accessibility permissions: Required for AppleScript-based paste
  • Microphone permission: System dialog on first use
  • Notarization: Required for distribution (Apple Developer account)
  • Universal binary: Supports both arm64 (M1/M2/M3) and x64 (Intel)
  • Dock behavior: LSUIElement: false shows app in dock with indicator dot

Build Process

The build pipeline combines Vite (frontend) and electron-builder (packaging):
# 1. Compile native binaries
npm run compile:native  # Swift, C sources

# 2. Download runtime binaries
npm run download:whisper-cpp
npm run download:llama-server
npm run download:sherpa-onnx

# 3. Build React app
cd src && vite build  # → src/dist/

# 4. Package with electron-builder
electron-builder --mac  # or --win, --linux
  • ASAR archive: Main process code packed into app.asar
  • ASAR unpacking: FFmpeg and better-sqlite3 extracted for native access
  • Code signing: macOS requires Apple Developer certificate
  • Notarization: Automated via @electron/notarize
  • Update server: GitHub Releases as update provider

Performance Characteristics

  • Startup time: 1-2 seconds (cold), less than 500ms (warm)
  • Memory footprint: 150-300 MB (varies by model size)
  • Transcription speed: 0.5-5x realtime (model-dependent)
  • IPC overhead: less than 10ms for typical operations
  • Database queries: less than 5ms for history fetch (100 records)
Local transcription with the base model achieves near real-time performance on M1 Macs and modern Intel/AMD CPUs.

Security Model

1

Context Isolation

Renderer has no direct Node.js/Electron API access
2

Preload Script

Only whitelisted IPC channels exposed via contextBridge
3

Input Validation

All IPC handlers validate parameters before processing
4

File System Sandboxing

Temp files use system temp directory with random filenames
5

API Key Storage

Keys stored in .env file with restrictive permissions (600)
6

No Remote Code

All code bundled at build time; no runtime downloads

Next Steps

Model Registry

Learn how AI models are managed and downloaded

Building from Source

Compile OpenWhispr for your platform

Troubleshooting

Debug common issues and errors

Build docs developers (and LLMs) love