Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/npateriya/LocalVoiceAI/llms.txt

Use this file to discover all available pages before exploring further.

Once LocalVoiceAI is installed and running as a background service, using it is frictionless — no window to switch to, no button to click. Just focus the app or text field where you want your words to appear, hold Fn+F10, speak, and release. Your transcribed text is pasted at the cursor automatically, system-wide.

Basic Usage

1

Focus the target app or text field

Click into any text field, chat input, code editor, or browser — anywhere you want the transcribed text to land. LocalVoiceAI works in any macOS app: Claude, Cursor, VS Code, Safari, Notes, Terminal, and more.
2

Hold Fn+F10 to start recording

Press and hold Fn+F10. Recording begins immediately — you’ll see a [REC] entry appear in the log. There is no beep or visual indicator in the app itself; check the log if you’re unsure.
3

Speak clearly

Talk naturally toward your Mac’s microphone. LocalVoiceAI captures audio at your device’s native sample rate (typically 48kHz or 44.1kHz) to avoid resampling artifacts.
4

Release Fn+F10 to stop recording

Let go of the key. LocalVoiceAI immediately stops the microphone and hands the audio to the local Whisper model running on Apple Metal GPU. Transcription takes approximately 1–2 seconds for a short phrase.
5

Transcribed text appears at your cursor

The resulting text is copied to your clipboard and pasted into the focused window via a simulated Cmd+V keystroke. You’ll see [OK] Pasted: <your text> in the log confirming success.

Tips for Best Results

Speak for at least 250ms before releasing the key. Clips shorter than roughly a quarter of a second are automatically discarded with [SKIP] Too short, ignored. — this prevents accidental single-tap presses from triggering transcription.
LocalVoiceAI works in any macOS application without additional setup. Because it runs as a LaunchAgent and pastes via CGEventPost, it operates independently of whichever terminal or IDE you used to install it — so it works equally well whether you’re typing in Claude, editing code in Cursor or VS Code, filling a form in Safari, or jotting notes in Apple Notes.
Background noise is filtered automatically. Whisper’s non-speech annotations — such as (music), (phone buzzing), and — are detected by a regex filter and discarded rather than pasted. If you see [SKIP] Non-speech audio ignored: (music) in your log, the filter is working as intended.

Reading the Logs

LocalVoiceAI writes all activity to /tmp/localvoice.log. Follow it in real time with:
tail -f /tmp/localvoice.log
Every line is prefixed with a status tag so you can see exactly what the service is doing at a glance:
PrefixMeaning
[REC] Recording at NNNHz...Microphone opened; recording in progress at the given sample rate
[REC] Captured X.Xs — transcribing...Key released; audio handed off to Whisper for transcription
[OK] Pasted: <text>Transcription succeeded and text was pasted into the focused window
[SKIP] Too short, ignored.Recording was shorter than ~250ms and was discarded
[SKIP] Nothing detected.Whisper returned an empty result; no text was pasted
[SKIP] Non-speech audio ignored: <annotation>Audio contained only music, noise, or other non-speech sounds
[ERROR] ...Something went wrong — check the message for details
If [OK] Pasted: appears but nothing shows up in your app, the target window may not have had keyboard focus at the moment of the simulated Cmd+V. Click directly into the text field before holding Fn+F10.

Explore Further

Configuration

Override the default F10 push-to-talk key using the WHISPER_KEYCODE environment variable.

Models

Swap in a different Whisper model for a different accuracy and speed tradeoff.

Troubleshooting

Fix common issues like missing permissions, event tap failures, and gibberish transcriptions.

Service Management

Start, stop, reload, and update the LocalVoiceAI LaunchAgent.

Build docs developers (and LLMs) love