OpenClicky is designed from the ground up as a voice-first assistant. Every part of the voice pipeline — activation shortcut, wake-word listening, speech-to-text transcription, AI reasoning, and spoken response — is pluggable, local-first, and built to stay out of your way until you need it. You hold a key, speak naturally, release, and Clicky answers — pointing at your screen when that helps.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jasonkneen/openclicky/llms.txt
Use this file to discover all available pages before exploring further.
How Activation Works
OpenClicky supports three distinct voice activation modes, selectable in Settings → Voice:Push to Talk
Hold the activation shortcut while speaking. Release to submit. The default and most reliable mode.
Toggle + Wake Word
Press the shortcut once to arm wake-word detection, then say Hey Clicky to start recording.
Always Wake Word
Keeps the wake-word listener armed at all times. Say Hey Clicky from anywhere to start a voice turn.
Push-to-Talk Shortcut
The default shortcut is Control + Option (hold both, speak, release). OpenClicky also supports Shift + Fn, Shift + Control, Ctrl + Option + Space, and Shift + Control + Space — all configurable from Settings. Activation is handled byGlobalPushToTalkShortcutMonitor, which installs a listen-only CGEvent tap on the session event stream. Because it is listen-only the tap never suppresses keystrokes, so the shortcut works in any app without stealing input. The tap monitors .flagsChanged, .keyDown, and .keyUp events, re-enables itself automatically if macOS ever disables it due to timeout or user input, and publishes transitions via a PassthroughSubject to the dictation manager.
A double-tap Shift gesture is also detected: two rapid standalone Shift presses (within 420 ms) open the OpenClicky notch panel at the current mouse location without entering voice mode.
Wake-Word Detection
When a wake-word mode is active,OpenClickyWakeWordManager runs Apple’s on-device SFSpeechRecognizer in a continuous low-power loop. It listens only locally — no audio is sent anywhere until the wake phrase triggers full dictation.
The wake phrase is “Hey Clicky” (case-insensitive, diacritic-insensitive). The detector also recognises common mishearings:
| Accepted phrase | Notes |
|---|---|
hey clicky | Primary phrase |
hay clicky | Common mishearing |
hey cliquey | Common mishearing |
hay cliquey | Common mishearing |
Wake-word listening requires
SFSpeechRecognizer.supportsOnDeviceRecognition to be true on your Mac. OpenClicky will not fall back to a remote speech gate for always-listening mode, because sending ambient audio to a cloud service would be a privacy problem. If on-device recognition is unavailable, use push-to-talk instead.Pluggable Transcription Providers
BuddyDictationManager captures microphone audio with AVAudioEngine and routes it through the active provider. Providers are swapped at runtime without restarting the app — changing the provider in Settings takes effect on the next voice press.
Apple Speech
Local, no API key required. Uses
SFSpeechRecognizer for streaming on-device recognition. Free and private, but accuracy varies by accent and ambient noise. Requires both Microphone and Speech Recognition permissions.OpenAI Whisper
Cloud-based streaming. Routes audio to OpenAI’s transcription API. High accuracy and strong support for technical vocabulary. Requires
OPENAI_API_KEY.AssemblyAI
Streaming cloud transcription. Connects over a WebSocket for low-latency partial results. Requires
ASSEMBLYAI_API_KEY in Settings or secrets.env.Deepgram
Streaming cloud transcription. Also WebSocket-based; strong on technical and developer vocabulary. Requires
DEEPGRAM_API_KEY.How the audio pipeline works
How the audio pipeline works
AVAudioEnginetaps the microphone input node with a buffer size of 256 frames — a small buffer deliberately chosen to minimise capture-to-provider handoff latency.- Each
AVAudioPCMBufferis forwarded to the activeBuddyStreamingTranscriptionSessionviaappendAudioBuffer(_:). - The session calls back
onTranscriptUpdatewith partial results, which are rendered live in the input bar as you speak. - When you release the shortcut,
requestFinalTranscript()is called. A fallback timer (default 2.4 seconds) submits the best available partial if the final transcript callback hasn’t fired yet. - If the session reports a “no speech detected” error and the transcript buffer is empty, the interaction is quietly discarded rather than submitted as an empty message.
Context-Aware Key Terms
The dictation manager builds a list of contextual key terms that are forwarded to the transcription provider when it supports hint weighting. The built-in list includes technical terms likeSwiftUI, Xcode, Vercel, Next.js, Claude, Anthropic, and Codex. You can extend this list programmatically for project-specific vocabulary.
TTS Providers: How Clicky Speaks
After Claude generates a response, OpenClicky reads it aloud through one of five text-to-speech providers, selected in Settings → Voice → Speech:| Provider | Key Required | Notes |
|---|---|---|
| GPT Realtime (default) | OPENAI_API_KEY | OpenAI’s realtime speech model. Low latency, natural pacing. |
| ElevenLabs | ELEVENLABS_API_KEY + Voice ID | High-quality, expressive voices. Configure voice ID in Settings. |
| Cartesia | CARTESIA_API_KEY + Voice ID | Fast streaming TTS. |
| Deepgram Aura | DEEPGRAM_API_KEY | Reuses the STT key. Defaults to Aura 2 Thalia voice. |
| Microsoft Edge | None | Free fallback using Edge TTS. |
Notch Panel Visual States
The OpenClicky notch panel — the compact surface that appears at the top of your screen — reflects the current voice state through colour and iconography:Ready
Accent colour (blue by default). Clicky is idle and waiting for input. Icon:
bolt.fill.Listening
Green. Microphone is active and audio is being captured. A live waveform visualisation shows audio power levels. Icon:
waveform.Thinking
Orange. Audio has been submitted and Claude is generating a response. Icon:
sparkles.Speaking
Purple. TTS is playing back the response. Icon:
speaker.wave.2.fill.voiceState enum in CompanionManager:
recordedAudioPowerHistory — a rolling 44-sample history of RMS audio levels, sampled every 30 ms and smoothed to prevent jitter.
Permissions Required
Microphone
Required for all voice modes. OpenClicky requests this the first time you press the shortcut. If denied, open System Settings → Privacy & Security → Microphone.
Speech Recognition
Required only for Apple Speech and Wake Word modes, which use
SFSpeechRecognizer. If denied, go to System Settings → Privacy & Security → Speech Recognition..notDetermined even right after the user taps Allow.