Vosk is an offline speech recognition library that runs entirely on-device — no network call, no API key, and no latency from a round trip to a cloud service. This page covers build setup, model bundling, recognizer configuration, the audio loop, JSON parsing, and lifecycle management for a fixed-phrase command set that mirrors the Rokid touchpad:Documentation Index
Fetch the complete documentation index at: https://mintlify.com/RealComputer/GlassKit/llms.txt
Use this file to discover all available pages before exploring further.
select, back, next, and previous.
Build Dependencies
Add the Vosk and JNA artifacts to your app module and configure the ABI filters:android.permission.RECORD_AUDIO to the manifest and request it at runtime before opening AudioRecord.
Model Setup
Bundle the English small model atapp/src/main/assets/model-en-us/. Use the download script below to pull and stage the model:
Recognizer Setup
Configure the recognizer with a fixed grammar — only the four command words plus[unk] for anything out-of-grammar:
Always include
[unk] in the grammar so out-of-grammar speech does not force a false command match. Without it, Vosk will pick the closest in-vocabulary word even when the user said something unrelated.trim().lowercase(Locale.US) to ensure consistent matching.
Audio Loop
Feed the recognizer 16 kHz mono PCM16 from a dedicated worker thread. Use sample counts, not byte counts, when passing aShortArray to acceptWaveForm.
Parsing Vosk JSON
Final results use the"text" key and partial results use the "partial" key:
Lifecycle
Start after prerequisites are met
Start the audio loop only after the model is fully unpacked by
StorageService and RECORD_AUDIO permission is granted. Starting earlier will fail silently or crash.Reset before each session
Call
recognizer.reset() before each new listening session to clear internal state from the previous run.Stop cleanly
On stop: set the stop flag, stop
AudioRecord, briefly interrupt/join the worker thread, release AudioRecord, clear partial UI state, and reset any audio meter to zero.Suppress duplicates
Suppress duplicate final commands within a ~400 ms window because endpointing can produce repeated finals for a single utterance.
- Missing model asset or unpack failure
- Missing
RECORD_AUDIOpermission - Invalid buffer size (
minBufferBytes <= 0) AudioRecordinit or start failure- Negative read count from
AudioRecord.read - Runtime exceptions in the recognition loop