Overview
Voice input in SuperCmd uses the Whisper API to transcribe your speech in real-time. The transcribed text is automatically inserted at your cursor position or used as a command in SuperCmd.Whisper voice input requires an OpenAI API key. You’ll be prompted to configure this on first use.
Getting Started
Activate Voice Input
Press the Fn key (or your configured voice input hotkey) to open the Whisper overlay.
How It Works
The Whisper integration is managed by theuseWhisperManager hook (src/renderer/src/hooks/useWhisperManager.ts):
Architecture
Recording Flow
- Activation: Fn key press opens detached overlay window (620x88px, bottom-center)
- Audio Capture: Browser MediaRecorder API captures audio from microphone
- Transcription: Audio buffer sent to OpenAI Whisper API (src/main/ai-provider.ts:539)
- Text Insertion: Transcribed text inserted at cursor or into SuperCmd
Whisper API Integration
Fromai-provider.ts:539, the transcription process:
Audio is sent directly to OpenAI and is not stored locally. Recording sessions are ephemeral.
Using Voice Input
Dictation Mode
Insert text anywhere:- Position your cursor in any text field
- Press and hold Fn
- Speak your text
- Release Fn to transcribe and insert
Command Mode
Use voice to run SuperCmd commands:- Press your SuperCmd hotkey to open the launcher
- Press Fn to activate voice input
- Speak a command name (e.g., “Open Spotify”)
- SuperCmd will search and execute the matching command
Continuous Recording
For longer dictation:- Press Fn to start
- Click the overlay to toggle hold mode
- Speak continuously
- Click Stop when finished
Settings
Voice Input Hotkey
Customize the activation key:- Open Settings > General
- Set Voice Input Hotkey (default: Fn)
- Options include: Fn, Right Cmd, Right Option, Right Shift
Whisper Model
Configure the transcription model:Language Settings
Optionally specify a language for better accuracy:- Open Settings > AI
- Set Whisper Language (optional)
- Use ISO 639-1 codes (e.g.,
en,es,fr,de)
Overlay Window
The Whisper overlay is a detached window managed byuseDetachedPortalWindow (src/renderer/src/useDetachedPortalWindow.ts):
Window Specifications
- Position: Bottom-center of screen
- Size: 620×88 pixels
- Style: Transparent, frameless
- Behavior: Auto-closes on blur or Escape
Visual States
- Listening
- Processing
- Complete
- Error
Animated waveform indicates active recording
Audio Format Support
Whisper accepts multiple audio formats (src/main/ai-provider.ts:529):Best Practices
Speak Clearly
Enunciate words clearly for better transcription accuracy
Use Quiet Environment
Reduce background noise for cleaner audio
Short Segments
Keep recordings under 30 seconds for faster transcription
Review First
Check transcribed text before sending or saving
Keyboard Shortcuts
| Action | Shortcut |
|---|---|
| Start/Stop Recording | Fn (hold) |
| Cancel Recording | Escape |
| Toggle Hold Mode | Click overlay |
Onboarding Practice
First-time users are guided through an onboarding flow:Review Transcription
See your practice text transcribed in real-time (src/renderer/src/hooks/useWhisperManager.ts:91)
Troubleshooting
No microphone detected
No microphone detected
- Grant microphone permission in System Settings > Privacy & Security > Microphone
- Ensure SuperCmd is checked in the list
- Restart SuperCmd after granting permission
Poor transcription quality
Poor transcription quality
- Check microphone input level in System Settings > Sound
- Reduce background noise
- Speak more slowly and clearly
- Try adjusting microphone position
Transcription is slow
Transcription is slow
- API response time varies based on audio length
- Check your internet connection
- Verify OpenAI API key is valid
API errors
API errors
- Verify OpenAI API key in Settings > AI
- Check API quota and billing status
- Ensure API key has Whisper API access
Privacy & Security
What’s Sent
- Raw audio recording (duration varies)
- Language hint (if configured)
- Model selection (whisper-1)
What’s Not Sent
- No personal identifiers
- No app context or metadata
- No previous recordings
Data Retention
According to OpenAI’s policy:- API requests may be retained for abuse monitoring
- Audio is not used for model training (as of March 2024)
- See OpenAI Privacy Policy for details
Advanced Usage
Text Accumulation
TheappendWhisperOnboardingPracticeText function (src/renderer/src/hooks/useWhisperManager.ts:91) intelligently concatenates transcription chunks:
Session Management
Voice sessions are tracked to prevent launcher interference:Cost Considerations
Whisper API pricing (as of 2024):- $0.006 per minute of audio
- Average 10-second recording: ~$0.001
- Monthly heavy usage (500 recordings): ~$5
Monitor your OpenAI usage dashboard to track Whisper API costs.