Overview
The Voice Actions module provides server-side functions for processing audio input, transcribing speech to text, and parsing voice commands using Google’s Gemini AI models.transcribeAudio
Transcribes audio files using the Gemini Flash Lite model with specialized prompting to clean timestamps and filler words.Parameters
Base64-encoded audio string (data:audio/…)
MIME type of the audio file
Response
Transcribed and cleaned text from the audio
Whether the transcription succeeded
Error message if transcription failed
Features
- Size Validation: Enforces maximum audio size limits (configured in
MAX_AUDIO_SIZE_MB) - Automatic Cleaning: Removes timestamps (00:00, 01:23, etc.) and excessive line breaks
- Error Handling: Returns structured error responses with logging
Example
executeVoiceCommand
Parses a voice transcription into a structured command using AI-powered natural language understanding.Parameters
Transcribed text from voice input
Response (Success)
Returns
true on successful parsingParsed command object with action type and parameters
Response (Failure)
Returns
false on parsing failureHuman-readable error message
Error code:
MISSING_API_KEY, PARSING_FAILED, or EXECUTION_ERRORWhether the error is recoverable (e.g., user can retry)
Features
- API Key Validation: Checks for
GOOGLE_GENERATIVE_AI_API_KEYbefore processing - Structured Validation: Uses Zod schemas for command validation
- Language Support: Configured for Spanish (es-ES) commands
- Confidence Scoring: Filters low-confidence interpretations
Example
Error Codes
| Code | Description | Recoverable |
|---|---|---|
MISSING_API_KEY | Google AI API key not configured | No |
PARSING_FAILED | Could not parse command from transcript | Yes |
EXECUTION_ERROR | Unexpected error during processing | No |
Configuration
Environment Variables
GOOGLE_GENERATIVE_AI_API_KEY: Required for all voice operationsMAX_AUDIO_SIZE_MB: Maximum audio file size (defined inconfig/limits)
Dependencies
@ai-sdk/google: Google AI SDK for Gemini modelsai: Vercel AI SDK for text generationVoiceCommandParserService: Internal service for command parsing
Models Used
- Transcription:
gemini-2.5-flash-lite(optimized for speed) - Command Parsing: Configured via
VoiceCommandParserService