IntentRecognizer class uses sentence embeddings to match user utterances against registered command phrases, enabling natural language voice command recognition.
Class Definition
Constructor Parameters
Path to the directory containing embedding model files (gemma-300m model and tokenizer.bin)
Embedding model architecture. Currently only
GEMMA_300M is supported.Model quantization variant:
"fp32"- Full precision (largest, most accurate)"fp16"- Half precision"q8"- 8-bit quantized"q4"- 4-bit quantized (recommended, best performance/accuracy)"q4f16"- 4-bit weights, fp16 activations
Minimum similarity score (0.0-1.0) to trigger an intent. Higher values are more restrictive.
0.6- Very permissive, more false positives0.7- Balanced (recommended)0.8- Strict, fewer false positives
Methods
register_intent
Register a command phrase and its handler.The canonical command phrase (e.g., “turn on the lights”)
Function called when the intent is triggered. Receives:
trigger_phrase(str) - The registered command phraseutterance(str) - The actual user’s wordssimilarity(float) - Confidence score 0.0-1.0
unregister_intent
Remove a registered intent.The command phrase to remove (must match exactly as registered)
process_utterance
Manually process an utterance against all registered intents.The text to match against registered intents
set_threshold
Change the similarity threshold.New threshold value (0.0-1.0)
get_threshold
Get the current threshold.get_intent_count
Get the number of registered intents.clear_intents
Remove all registered intents.Usage as Event Listener
IntentRecognizer implements TranscriptEventListener, so you can attach it to a transcriber:
Example: Smart Home Control
- “Please turn on the lights” → Matches “turn on the lights”
- “Switch off the lights” → Matches “turn off the lights”
- “Make it 72 degrees” → Matches “set temperature to 72”
Example: Dynamic Commands
Semantic Matching
Unlike exact keyword matching, the intent recognizer understands semantic similarity:Threshold Tuning
How to choose the right threshold
How to choose the right threshold
Threshold 0.6: Very permissive
- More false positives
- Good for casual, varied language
- Use when: Users speak naturally, informally
- Good accuracy with flexibility
- Handles most variations
- Use when: General voice command systems
- Fewer false positives
- Requires closer matches
- Use when: Safety-critical commands, confirmation required
Command-Line Usage
Command-line options
Command-line options
--intents- Comma-separated list of command phrases--embedding-model- Path to embedding model--quantization- Model variant (fp32, fp16, q8, q4, q4f16)--threshold- Similarity threshold (0.0-1.0)--language- ASR language for transcription
Performance
| Model Variant | Size | Latency | Accuracy |
|---|---|---|---|
| fp32 | 1.2 GB | 50ms | Best |
| fp16 | 600 MB | 40ms | Excellent |
| q8 | 300 MB | 30ms | Very good |
| q4 (recommended) | 150 MB | 25ms | Good |
| q4f16 | 200 MB | 28ms | Very good |
Limitations
Future versions will support “slot filling” to extract parameters like numbers, times, and names from utterances.
See Also
- Command Recognition Guide - Complete guide
- Data Structures - IntentMatch structure
- Transcriber - For integration with speech recognition