Architecture

OpenWhispr is built on Electron 36 with a privacy-first, dual-window architecture designed for efficient local and cloud speech-to-text transcription.

Core Technologies

Frontend

React 19, TypeScript, Tailwind CSS v4, Vite

Desktop Framework

Electron 36 with context isolation

Database

better-sqlite3 for local history

Speech Processing

whisper.cpp + NVIDIA Parakeet + OpenAI API

Dual Window System

OpenWhispr uses two independent windows for optimal UX:

Main Window (Dictation Overlay)

Minimal, always-on-top floating panel for recording control

Draggable: Positioned anywhere on screen
Always on top: Never obscured by other windows
Frameless: Minimal chrome for distraction-free use
Click-through: Main process handles interactivity via IPC

Control Panel Window

Full-featured settings and management interface

Normal window: Standard maximize/minimize/close
Settings UI: API keys, model selection, hotkeys
History: SQLite-backed transcription archive
Model management: Download/delete Whisper models

Both windows use the same React codebase with URL-based routing (?panel=control vs. ?panel=main). This reduces bundle size and maintains consistency.

Process Architecture

Main Process (`main.js`)

The Electron main process handles:

Window lifecycle: Create, show, hide, destroy windows
IPC handlers: 100+ secure channels (see src/helpers/ipcHandlers.js)
Database operations: SQLite reads/writes via better-sqlite3
Native integrations: Swift (macOS Globe key), C (Windows key listener)
Global hotkeys: Cross-platform keyboard shortcut registration

Manager Pattern: All main process logic is organized into manager classes:

// Initialized in startApp() after app.whenReady()
const managers = {
  environment: new EnvironmentManager(),      // .env file management
  window: new WindowManager(),                // Window creation/positioning
  database: new DatabaseManager(),            // SQLite operations
  clipboard: new ClipboardManager(),          // Cross-platform paste
  whisper: new WhisperManager(),              // Local whisper.cpp
  parakeet: new ParakeetManager(),            // NVIDIA Parakeet ASR
  tray: new TrayManager(),                    // System tray icon/menu
  update: new UpdateManager(),                // Auto-update (electron-updater)
  hotkey: new HotkeyManager(),                // Global shortcuts
  globeKey: new GlobeKeyManager(),            // macOS Fn/Globe key
  windowsKey: new WindowsKeyManager(),        // Windows push-to-talk
};

Renderer Process (React)

The renderer runs the UI with context isolation enabled:

No direct Node.js access: Security boundary enforced
IPC via contextBridge: window.electronAPI only
Separate Vite build: Fast HMR during development
Shared components: shadcn/ui + Radix primitives

Preload Script (`preload.js`)

Secure bridge between main and renderer:

// Exposes safe IPC methods
contextBridge.exposeInMainWorld('electronAPI', {
  // Database
  saveTranscription: (text) => ipcRenderer.invoke('db-save-transcription', text),
  getTranscriptions: (limit) => ipcRenderer.invoke('db-get-transcriptions', limit),
  
  // Whisper
  transcribeLocalWhisper: (audioBlob, options) => 
    ipcRenderer.invoke('transcribe-local-whisper', audioBlob, options),
  downloadWhisperModel: (modelName) => 
    ipcRenderer.invoke('download-whisper-model', modelName),
  
  // Event listeners (with cleanup)
  onToggleDictation: (callback) => {
    const listener = () => callback();
    ipcRenderer.on('toggle-dictation', listener);
    return () => ipcRenderer.removeListener('toggle-dictation', listener);
  },
});

Never expose ipcRenderer directly. Always use invoke() for async calls and register listeners with cleanup functions to prevent memory leaks.

Audio Pipeline

The complete flow from microphone to text:

Recording (Renderer)

MediaRecorder API captures audio in WebM format

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const recorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
recorder.ondataavailable = (e) => chunks.push(e.data);

Blob Creation (Renderer)

When recording stops, chunks are combined into a Blob:

const blob = new Blob(chunks, { type: 'audio/webm' });
const arrayBuffer = await blob.arrayBuffer();

IPC Transfer (Renderer → Main)

ArrayBuffer is sent via IPC (max 10MB):

const result = await window.electronAPI.transcribeLocalWhisper(
  arrayBuffer,
  { model: 'base', language: 'en' }
);

Temporary File (Main)

Main process writes to temp directory:

const tempFile = path.join(tmpdir(), `whisper-${Date.now()}.webm`);
fs.writeFileSync(tempFile, Buffer.from(arrayBuffer));

FFmpeg Conversion (Main)

ffmpeg-static converts WebM → WAV (16kHz mono):

ffmpeg -i input.webm -ar 16000 -ac 1 -f wav output.wav

FFmpeg is bundled and unpacked from ASAR to app.asar.unpacked/node_modules/ffmpeg-static/

whisper.cpp Transcription (Main)

Native binary processes WAV file:

./whisper-cpp -m models/ggml-base.bin -f audio.wav -l en

Output is parsed from stdout and returned via IPC.

Cleanup (Main)

Temporary files are deleted:

fs.unlinkSync(tempFile);
fs.unlinkSync(wavFile);

Display (Renderer)

Transcription text is shown and optionally pasted at cursor.

Performance: The entire pipeline completes in 1-5 seconds for 30-second recordings using the base model.

Database Schema

SQLite database at ~/.config/OpenWhispr/transcriptions.db (Linux/macOS) or %APPDATA%/OpenWhispr/transcriptions.db (Windows):

CREATE TABLE transcriptions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
  original_text TEXT NOT NULL,
  processed_text TEXT,
  is_processed BOOLEAN DEFAULT 0,
  processing_method TEXT DEFAULT 'none',
  agent_name TEXT,
  error TEXT
);

CREATE INDEX idx_timestamp ON transcriptions(timestamp DESC);

Key features:

Automatic timestamps: No manual date handling
AI processing tracking: is_processed, processing_method, agent_name
Error logging: Failed transcriptions stored with error messages

Native Integrations

macOS Globe Key Listener

File: resources/globe-listener.swift

// Detects Fn/Globe key presses using IOKit
let eventMask = (1 << kCGEventFlagsChanged)
let eventTap = CGEvent.tapCreate(
  tap: .cgSessionEventTap,
  place: .headInsertEventTap,
  options: .defaultTap,
  eventsOfInterest: CGEventMask(eventMask),
  callback: eventCallback,
  userInfo: nil
)

Compiled at build: scripts/build-globe-listener.js

Requires Xcode Command Line Tools (xcode-select --install) to compile Swift source.

Windows Push-to-Talk Listener

File: resources/windows-key-listener.c

// Low-level keyboard hook for compound hotkeys
HHOOK hook = SetWindowsHookEx(
  WH_KEYBOARD_LL,
  KeyboardProc,
  NULL,
  0
);

LRESULT CALLBACK KeyboardProc(int nCode, WPARAM wParam, LPARAM lParam) {
  if (wParam == WM_KEYDOWN) {
    printf("KEY_DOWN\n");
    fflush(stdout);
  }
  return CallNextHookEx(NULL, nCode, wParam, lParam);
}

Distribution: Prebuilt binary downloaded from GitHub releases

Linux Fast Paste (X11/Wayland)

File: resources/linux-fast-paste.c

// XTest extension for X11
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_Control_L), True, 0);
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_v), True, 0);
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_v), False, 0);
XTestFakeKeyEvent(display, XKeysymToKeycode(display, XK_Control_L), False, 0);
XFlush(display);

Fallback chain: native binary → wtype → ydotool → xdotool → manual paste

IPC Communication Patterns

Invoke (Request/Response)

For operations returning data:

// Renderer
const result = await window.electronAPI.getTranscriptions(100);

// Main (ipcHandlers.js)
ipcMain.handle('db-get-transcriptions', async (event, limit) => {
  return databaseManager.getTranscriptions(limit);
});

Send (Fire-and-Forget)

For notifications:

// Renderer
window.electronAPI.notifyActivationModeChanged('push');

// Main
ipcMain.on('activation-mode-changed', (event, mode) => {
  windowManager.setActivationModeCache(mode);
});

Events (Main → Renderer)

For real-time updates:

// Renderer (with cleanup)
const cleanup = window.electronAPI.onUpdateAvailable((info) => {
  console.log('Update available:', info.version);
});

// Main
windowManager.controlPanelWindow.webContents.send('update-available', {
  version: '1.5.5',
  releaseNotes: '...',
});

Memory leaks: Always return cleanup functions from event listeners and call them in React’s useEffect cleanup.

Platform-Specific Considerations

macOS
Windows
Linux

Accessibility permissions: Required for AppleScript-based paste
Microphone permission: System dialog on first use
Notarization: Required for distribution (Apple Developer account)
Universal binary: Supports both arm64 (M1/M2/M3) and x64 (Intel)
Dock behavior: LSUIElement: false shows app in dock with indicator dot

No special permissions: Push-to-talk works without admin rights
NSIS installer: Creates Start Menu shortcuts and uninstaller
Code signing: Optional but recommended (prevents SmartScreen warnings)
App ID: com.herotools.openwispr (groups all windows in taskbar)
Paste tools: Native windows-fast-paste.exe → PowerShell SendKeys → nircmd.exe

Build Process

The build pipeline combines Vite (frontend) and electron-builder (packaging):

# 1. Compile native binaries
npm run compile:native  # Swift, C sources

# 2. Download runtime binaries
npm run download:whisper-cpp
npm run download:llama-server
npm run download:sherpa-onnx

# 3. Build React app
cd src && vite build  # → src/dist/

# 4. Package with electron-builder
electron-builder --mac  # or --win, --linux

Production optimizations

ASAR archive: Main process code packed into app.asar
ASAR unpacking: FFmpeg and better-sqlite3 extracted for native access
Code signing: macOS requires Apple Developer certificate
Notarization: Automated via @electron/notarize
Update server: GitHub Releases as update provider

Performance Characteristics

Startup time: 1-2 seconds (cold), less than 500ms (warm)
Memory footprint: 150-300 MB (varies by model size)
Transcription speed: 0.5-5x realtime (model-dependent)
IPC overhead: less than 10ms for typical operations
Database queries: less than 5ms for history fetch (100 records)

Local transcription with the base model achieves near real-time performance on M1 Macs and modern Intel/AMD CPUs.

Security Model

Context Isolation

Renderer has no direct Node.js/Electron API access

Preload Script

Only whitelisted IPC channels exposed via contextBridge

Input Validation

All IPC handlers validate parameters before processing

File System Sandboxing

Temp files use system temp directory with random filenames

API Key Storage

Keys stored in .env file with restrictive permissions (600)

No Remote Code

All code bundled at build time; no runtime downloads

Next Steps

Model Registry

Learn how AI models are managed and downloaded

Building from Source

Compile OpenWhispr for your platform

Troubleshooting

Debug common issues and errors

Get Started

Core Features

Configuration

Platform Guides

Advanced

Core Technologies

Frontend

Desktop Framework

Database

Speech Processing

Dual Window System

Process Architecture

Main Process (`main.js`)

Renderer Process (React)

Preload Script (`preload.js`)

Audio Pipeline

Database Schema

Native Integrations

macOS Globe Key Listener

Windows Push-to-Talk Listener

Linux Fast Paste (X11/Wayland)

IPC Communication Patterns

Invoke (Request/Response)

Send (Fire-and-Forget)

Events (Main → Renderer)

Platform-Specific Considerations

Build Process

Performance Characteristics

Security Model

Next Steps

Model Registry

Building from Source

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Platform Guides

Advanced

​Core Technologies

Frontend

Desktop Framework

Database

Speech Processing

​Dual Window System

​Process Architecture

​Main Process (main.js)

​Renderer Process (React)

​Preload Script (preload.js)

​Audio Pipeline

​Database Schema

​Native Integrations

​macOS Globe Key Listener

​Windows Push-to-Talk Listener

​Linux Fast Paste (X11/Wayland)

​IPC Communication Patterns

​Invoke (Request/Response)

​Send (Fire-and-Forget)

​Events (Main → Renderer)

​Platform-Specific Considerations

​Build Process

​Performance Characteristics

​Security Model

​Next Steps

Model Registry

Building from Source

Troubleshooting

Build docs developers (and LLMs) love

Core Technologies

Dual Window System

Process Architecture

Main Process (`main.js`)

Renderer Process (React)

Preload Script (`preload.js`)

Audio Pipeline

Database Schema

Native Integrations

macOS Globe Key Listener

Windows Push-to-Talk Listener

Linux Fast Paste (X11/Wayland)

IPC Communication Patterns

Invoke (Request/Response)

Send (Fire-and-Forget)

Events (Main → Renderer)

Platform-Specific Considerations

Build Process

Performance Characteristics

Security Model

Next Steps