Text-to-Speech (Read)

SuperCmd’s text-to-speech feature (SuperCmd Read) converts selected text into natural-sounding speech, perfect for proofreading, accessibility, or consuming content hands-free.

Overview

SuperCmd Read supports multiple TTS engines:

Edge TTS: Microsoft’s neural voices (free, 400+ voices)
ElevenLabs: Ultra-realistic AI voices (requires API key)
System TTS: Native macOS voices

Edge TTS is the default engine and requires no configuration. It provides excellent quality at no cost.

Quick Start

Select Text

Highlight any text in any application

Trigger Read

Press Cmd+Shift+R (or your configured Read hotkey)

Control Playback

A floating control panel appears in the top-right corner with play/pause/stop controls

Adjust Settings

Change voice, speed, and other options from the control panel

TTS Engines

Edge TTS (Recommended)

Microsoft’s Edge TTS provides high-quality neural voices for free: Features:

400+ voices across 100+ languages
Natural prosody and intonation
No API key required
No usage limits
Excellent quality

Popular Voices:

en-US-EricNeural (Male, American English)
en-US-JennyNeural (Female, American English)
en-GB-RyanNeural (Male, British English)
en-AU-NatashaNeural (Female, Australian English)

ElevenLabs

Ultra-realistic AI voices for premium quality: Features:

Studio-quality voice synthesis
Emotional expression
Custom voice cloning (paid tiers)
Multiple languages

Configuration:

Sign up at elevenlabs.io
Get your API key from the dashboard
Settings > AI > ElevenLabs API Key
Settings > AI > Text-to-Speech Model > Select ElevenLabs voice

Built-in Voices:

Rachel (Female, warm)
Antoni (Male, calm)
Bella (Female, friendly)
Josh (Male, energetic)

ElevenLabs is a paid service. Free tier includes 10,000 characters/month. Check pricing at elevenlabs.io/pricing.

System TTS

Uses macOS built-in voices: Features:

Works offline
No external dependencies
Lower quality than neural voices
Limited voice options

Using SuperCmd Read

Read Overlay

The Read overlay is a floating control panel (src/renderer/src/hooks/useSpeakManager.ts): Window Specs:

Position: Top-right corner
Size: 520×112 pixels
Always on top: Yes
Auto-hide: Closes when playback completes

Controls:

Play/Pause button
Stop button
Progress indicator
Voice selector
Speed control
Close button

Voice Selection

From the Read overlay, click the voice dropdown to choose from available voices:

// Voice options are built dynamically (src/renderer/src/hooks/useSpeakManager.ts:271)
readVoiceOptions = buildReadVoiceOptions(
  edgeTtsVoices,      // 400+ Edge voices
  currentVoice,       // Selected voice
  configuredVoice     // User preference
)

Playback Speed

Adjust speaking rate:

Slower: -50% to 0%
Normal: +0%
Faster: +10% to +100%

Slow
Normal
Fast

-50% - Good for language learning or difficult content

+0% - Natural speaking pace

+50% - Quick review or familiar content

Settings

Default Voice

Set your preferred voice:

Settings > AI tab
Text-to-Speech Model: Select engine
Edge TTS Voice: Choose specific voice (if using Edge TTS)
ElevenLabs Voice: Choose voice (if using ElevenLabs)

Keyboard Shortcut

Customize the Read hotkey:

Settings > Hotkeys
Read Selected Text: Set custom shortcut (default: Cmd+Shift+R)

Auto-Resume

Configure behavior when switching voices mid-playback (src/renderer/src/hooks/useSpeakManager.ts:261):

speakUpdateOptions({
  voice: newVoice,
  restartCurrent: true  // Resume from current position
})

Advanced Features

Word Highlighting

SuperCmd Read tracks the current word being spoken:

export interface SpeakStatus {
  state: 'idle' | 'loading' | 'speaking' | 'done' | 'error';
  text: string;       // Full text being read
  index: number;      // Current chunk index
  total: number;      // Total chunks
  wordIndex?: number; // Current word position
}

This enables visual highlighting of the current word in future updates.

Text Chunking

Long text is automatically split into manageable chunks:

Text divided into sentences or paragraphs
Each chunk processed separately
Seamless playback across chunks
Progress indicator shows overall position

Error Handling

Graceful fallbacks for API issues:

// State machine (src/renderer/src/hooks/useSpeakManager.ts:59)
idle → loading → speaking → done
  ↓
error (with message)

Errors display in the overlay with actionable messages.

Language Support

Edge TTS Languages

Supports 100+ languages including:

English (US, UK, AU, CA, IE, IN, NZ, ZA)
Spanish (ES, MX, AR, CO)
French (FR, CA)
German (DE, AT, CH)
Chinese (Mandarin, Cantonese)
Japanese, Korean, Arabic, Hindi, and more

ElevenLabs Languages

Supports:

English, Spanish, French, German
Portuguese, Italian, Polish
And expanding

Edge TTS automatically detects language from text, so you can read multilingual content without changing settings.

Performance

Edge TTS Performance

Latency: ~500ms initial
Streaming: Real-time chunk playback
Network: ~5 KB/s audio stream
Offline: Not available (requires internet)

ElevenLabs Performance

Latency: ~1-2s initial
Quality: Highest available
Network: ~10 KB/s audio stream
Caching: Frequently used phrases cached

Integration with Workflow

Use Cases

Proofreading

Catch errors by hearing your writing read aloud

Accessibility

Read web pages, documents, emails hands-free

Language Learning

Hear correct pronunciation in foreign languages

Multitasking

Listen to articles while working on other tasks

Reading Long Documents

Select all text (Cmd+A)
Trigger Read (Cmd+Shift+R)
Use speed controls to adjust pace
Pause/resume as needed

Reading Web Content

Select article text
Press Read hotkey
Continue browsing while listening
Overlay stays on top

Troubleshooting

No audio output

Check system volume
Verify output device in System Settings > Sound
Try a different voice
Restart SuperCmd

Choppy playback

Check internet connection stability
Try reducing playback speed
Close bandwidth-heavy applications
Switch to a different TTS engine

Wrong language spoken

Edge TTS auto-detects language from text
Ensure text is in a supported language
Try manually selecting a language-specific voice

ElevenLabs not working

Verify API key in Settings > AI
Check ElevenLabs quota/billing
Ensure API key has correct permissions
Test API key on ElevenLabs website

Privacy & Costs

Edge TTS

Edge TTS is free and has no usage limits. Text is sent to Microsoft servers for synthesis. Microsoft may log requests for service improvement.

ElevenLabs

Pricing (as of 2024):

Free: 10,000 characters/month
Starter: $5/month (30,000 characters)
Creator: $22/month (100,000 characters)
Pro: $99/month (500,000 characters)

Monitor your ElevenLabs usage at elevenlabs.io/usage to avoid unexpected charges.

Keyboard Shortcuts

Action	Shortcut
Read Selected Text	`Cmd+Shift+R`
Pause/Resume	`Space` (in overlay)
Stop Reading	`Escape`
Close Overlay	`Cmd+W`
Increase Speed	`]`
Decrease Speed	`[`

Technical Details

Voice Cache Management

ElevenLabs voices are cached to reduce API calls (src/renderer/src/utils/voice-cache.ts):

function getCachedElevenLabsVoices(): ElevenLabsVoice[] | null {
  // Shared cache between speak and settings views
  // 24-hour TTL
  // Cleared on API errors
}

Audio Streaming

TTS audio is streamed in real-time:

Text sent to TTS API
Audio chunks received progressively
Playback begins immediately
Remaining chunks buffered in background

This provides near-instant playback start, even for long text.

Get Started

Core Features

Configuration

Extensions

​Overview

​Quick Start

​TTS Engines

​Edge TTS (Recommended)

​ElevenLabs

​System TTS

​Using SuperCmd Read

​Read Overlay

​Voice Selection

​Playback Speed

​Settings

​Default Voice

​Keyboard Shortcut

​Auto-Resume

​Advanced Features

​Word Highlighting

​Text Chunking

​Error Handling

​Language Support

​Edge TTS Languages

​ElevenLabs Languages

​Performance

​Edge TTS Performance

​ElevenLabs Performance

​Integration with Workflow

​Use Cases

Proofreading

Accessibility

Language Learning

Multitasking

​Reading Long Documents

​Reading Web Content

​Troubleshooting

​Privacy & Costs

​Edge TTS

​ElevenLabs

​Keyboard Shortcuts

​Technical Details

​Voice Cache Management

​Audio Streaming

Build docs developers (and LLMs) love

Overview

Quick Start

TTS Engines

Edge TTS (Recommended)

ElevenLabs

System TTS

Using SuperCmd Read

Read Overlay

Voice Selection

Playback Speed

Settings

Default Voice

Keyboard Shortcut

Auto-Resume

Advanced Features

Word Highlighting

Text Chunking

Error Handling

Language Support

Edge TTS Languages

ElevenLabs Languages

Performance

Edge TTS Performance

ElevenLabs Performance

Integration with Workflow

Use Cases

Reading Long Documents

Reading Web Content

Troubleshooting

Privacy & Costs

Edge TTS

ElevenLabs

Keyboard Shortcuts

Technical Details

Voice Cache Management

Audio Streaming