Intent recognition allows applications to detect when users request specific actions, using natural language variations rather than exact phrase matching. Moonshine Voice uses semantic embeddings to match user speech to registered commands with fuzzy matching.
The previous generation of voice interfaces could only recognize speech phrased exactly as expected. “Alexa, turn on living-room lights” might work, but “Alexa, lights on in the living room please” might not.
Users naturally express the same intent in different ways:
“Turn on the lights” → “Switch on the lights” → “Lights on” → “Let there be light”
“What’s the weather” → “Weather forecast” → “Tell me the weather”
“Play some music” → “Start playing music” → “I want to hear music”
from moonshine_voice import IntentRecognizer, get_embedding_model# Download and load embedding modelmodel_path, model_arch = get_embedding_model( model_name="embeddinggemma-300m", quantization="fp32")# Create recognizerrecognizer = IntentRecognizer( model_path=model_path, model_arch=model_arch, model_variant="fp32", threshold=0.7)# Register intents with handlersdef on_lights_on(trigger, utterance, similarity): print(f"Turning lights on (confidence: {similarity:.0%})") # Your light control code hererecognizer.register_intent("turn on the lights", on_lights_on)# Process utterancesrecognizer.process_utterance("switch on the lights") # Triggers handlerrecognizer.process_utterance("illuminate the room") # Triggers handlerrecognizer.process_utterance("play some music") # No trigger
From python/src/moonshine_voice/intent_recognizer.py:45-62:
class IntentRecognizer(TranscriptEventListener): """Intent recognizer that uses semantic embeddings to match utterances. This class can be used standalone by calling process_utterance(), or as a TranscriptEventListener to automatically process completed transcript lines. """
Automatic intent detection from transcription:
from moonshine_voice import ( MicTranscriber, IntentRecognizer, get_model_for_language, get_embedding_model)# Load modelsmodel_path, model_arch = get_model_for_language("en")embed_path, embed_arch = get_embedding_model("embeddinggemma-300m", "fp32")# Create transcribertranscriber = MicTranscriber( model_path=model_path, model_arch=model_arch)# Create intent recognizerrecognizer = IntentRecognizer( model_path=embed_path, model_arch=embed_arch, threshold=0.7)# Register intentsdef handle_lights_on(trigger, utterance, similarity): print(f"💡 Lights on! ({similarity:.0%} match)")def handle_lights_off(trigger, utterance, similarity): print(f"💡 Lights off! ({similarity:.0%} match)")recognizer.register_intent("turn on the lights", handle_lights_on)recognizer.register_intent("turn off the lights", handle_lights_off)# Connect recognizer to transcribertranscriber.add_listener(recognizer)# Start listeningtranscriber.start()# Now any completed speech automatically triggers intent matching
From python/src/moonshine_voice/intent_recognizer.py:190-231:
def register_intent( self, trigger_phrase: str, # Canonical command phrase handler: IntentHandler # Callback function) -> None: """ Register an intent with a trigger phrase and handler. When an utterance is processed that is similar enough to the trigger phrase (above the threshold), the handler will be invoked. """
From python/src/moonshine_voice/intent_recognizer.py:255-274:
def process_utterance(self, utterance: str) -> bool: """ Process an utterance and invoke the handler of the most similar intent. Returns: True if an intent was recognized and handler invoked, False otherwise. """
# Get/set threshold dynamicallyrecognizer.threshold = 0.8current = recognizer.threshold# Get number of registered intentscount = recognizer.intent_count# Remove specific intentrecognizer.unregister_intent("turn on the lights")# Remove all intentsrecognizer.clear_intents()
From python/src/moonshine_voice/intent_recognizer.py:324-349:
def on_line_completed(self, event: LineCompleted) -> None: """ Called when a transcription line is completed. This implements the TranscriptEventListener interface, allowing the IntentRecognizer to automatically process completed transcript lines. """ if event.line and event.line.text: # Strip whitespace and process non-empty utterances utterance = event.line.text.strip() if utterance: self.process_utterance(utterance)
When used as a listener:
User speaks → Transcriber → LineCompleted event → IntentRecognizer ↓ process_utterance() ↓ Compare embeddings ↓ Trigger handler if match
From python/src/moonshine_voice/intent_recognizer.py:276-289:
@propertydef threshold(self) -> float: """Get the current similarity threshold.""" return self._lib.moonshine_get_intent_threshold(self._handle)@threshold.setterdef threshold(self, value: float) -> None: """Set the similarity threshold.""" error = self._lib.moonshine_set_intent_threshold(self._handle, value)
Experiment with different values:
recognizer = IntentRecognizer(model_path, model_arch, threshold=0.7)# Test different thresholdsfor threshold in [0.5, 0.6, 0.7, 0.8, 0.9]: recognizer.threshold = threshold print(f"\nThreshold: {threshold}") test_phrases = [ "turn on the lights", "switch on lights", "illuminate the room", "let there be light", "play some music", # Should not match ] for phrase in test_phrases: matched = recognizer.process_utterance(phrase) print(f" '{phrase}': {'✓' if matched else '✗'}")
Typical results:
0.5: Many false positives, matches unrelated phrases
0.6: Good for diverse expressions, some false positives
0.7: Balanced, recommended default
0.8: Conservative, may miss valid variations
0.9: Very strict, almost exact semantic match required
utterances = [ "turn on the lights", "what's the weather", "play some music", # ... many more]for utterance in utterances: recognized = recognizer.process_utterance(utterance) if recognized: print(f"Matched: {utterance}")
The current intent recognition is designed for full-sentence matching. Future versions will expand into “slot filling” techniques to extract quantities like “I want ten bananas”.