Activation Hotkey
The global hotkey Option+D (⌥D) activates the voice interface from anywhere on macOS.The first time you press Option+D, Rubber Duck will automatically attach to the workspace you have open in the terminal where you ran
duck [path].Hotkey Behavior
The activation hotkey is a toggle:- Press when idle: Connects to the OpenAI Realtime API, starts audio streaming, and begins listening for speech
- Press when active: Immediately disconnects the session and stops all audio streaming
You can change the activation hotkey in Settings > Hotkeys. The default settings hotkey is Option+Shift+D (⌥⇧D).
Speech Detection
Rubber Duck uses the OpenAI Realtime API’s server-side VAD to detect speech:- Speech started: The server detects you’ve begun speaking (Rubber Duck shows “Listening” state)
- Speech stopped: The server detects a natural pause (Rubber Duck transitions to “Thinking”)
- Assistant response: The model processes your input and responds with audio and text
Noise Gate
Rubber Duck applies a client-side noise gate to prevent ambient noise from triggering false speech detection:- Quiet audio below
-46 dBFSis replaced with silence - Without echo cancellation, the threshold is raised to
-38 dBFSto reduce speaker bleed - A hold time of 250ms (or 350ms without AEC) keeps the gate open during natural inter-word pauses
AudioManager.swift:471-503.
Audio Streaming
Rubber Duck streams audio at 24 kHz PCM16 mono to the OpenAI Realtime API. The capture path is:AudioManager.swift) supports two modes:
Voice Processing Mode (Preferred)
Enables VoiceProcessingIO on the AVAudioEngine input node, which provides:- Acoustic Echo Cancellation (AEC): Hardware-level echo cancellation at the driver level
- Noise Suppression: Additional background noise reduction
- AGC: Automatic gain control for consistent input levels
On some multi-channel devices (e.g., MacBook Pro with 9-channel mic array), Rubber Duck automatically downmixes to mono for compatibility with VoiceProcessingIO.
Standard Mode (Fallback)
If VoiceProcessingIO fails to initialize (e.g., due to incompatible hardware or audio routing), Rubber Duck falls back to standard audio mode:- No hardware AEC
- Software AEC is applied: the playback reference signal is subtracted from the captured microphone signal to cancel echo
- Input muting during assistant playback (unmuted after audio drains) to prevent residual echo
PlaybackReferenceBuffer) populated by the playback manager, with adaptive delay estimation and gain calibration.
Text-to-Speech
Assistant responses are streamed as audio deltas from the OpenAI Realtime API and played through the system speaker. The playback path is:AudioPlaybackManager.swift.
Smart Truncation for Voice
Rubber Duck filters assistant responses before speaking to avoid reading long code blocks or diffs verbatim:- Short responses: Spoken verbatim
- Responses with long code blocks: A summary is spoken, and the assistant says “details are in the terminal”
Barge-In (Interruptions)
Rubber Duck supports immediate barge-in: you can interrupt the assistant at any time while it’s speaking.How Barge-In Works
- Speech detection during playback: If the server detects
speech_startedwhile the assistant is speaking, Rubber Duck schedules a confirmation delay (default: 350ms) - Confirmation window: If speech continues for the full delay, barge-in is confirmed
- Stop playback immediately:
AVAudioPlayerNode.stop()is called to halt audio output - Truncate response: Rubber Duck sends
conversation.item.truncateto the server with the exact playback position (in milliseconds) - Resume listening: The session transitions back to “Listening” state
VoiceSessionCoordinator.swift:286-389.
Abort vs. Steer Modes
Rubber Duck offers two interruption modes (configured in Settings):Auto-Abort (Default)
When you interrupt, Rubber Duck:- Stops playback and truncates the assistant’s response at the exact playback position
- The server discards any planned tool calls that haven’t started yet
- Your new speech is treated as a fresh user turn
Steer Mode (Auto-Abort Disabled)
When you interrupt in steer mode, Rubber Duck:- Stops playback immediately
- Sends your new speech as a steering message to the server
- The server delivers the steer message after the current tool completes
- Any remaining planned tools are skipped
The CLI always prints a line indicating whether the run was aborted or steered so you have full visibility into what happened.
Echo Cancellation and Barge-In
Barge-in reliability depends on echo cancellation:| Mode | Microphone During Playback | Confirmation Delay | Echo Risk |
|---|---|---|---|
| VoiceProcessingIO (Hardware AEC) | Always open | 350ms (default) | Very low |
| Software AEC | Always open | 450ms minimum | Low |
| No AEC | Muted, then unmuted after playback | 550ms minimum | Medium (echo during unmute transition) |
Session State Visualization
The menu bar icon and overlay reflect the current voice session state:| State | Icon | Overlay |
|---|---|---|
| Idle | Default | Hidden |
| Connecting | Animated | ”Thinking…” |
| Listening | Microphone | ”Listening” |
| Thinking | Brain | ”Thinking…” |
| Speaking | Speaker | ”Speaking” |
| Tool Running | Gear | Tool name (e.g., “bash”) |
VoiceSessionCoordinator.swift:6-13 and OverlayPresenter.swift.
Microphone Permissions
Rubber Duck requires microphone access to capture audio. On first launch, macOS will prompt for permission. If you deny permission or revoke it later:- The voice interface will not activate
duck doctorwill show a warning- You can grant permission in System Settings > Privacy & Security > Microphone
API Key Management
Rubber Duck stores your OpenAI API key securely in the macOS Keychain. You can set or update the key in Settings > API Keys.Related
- Interruptions - Deep dive into barge-in behavior and abort vs. steer modes
- Sessions - Session management and multi-workspace support
- CLI Commands - Using the
duckCLI to attach workspaces and send messages