Installation
Authentication
Set your API key in the environment:Components
STT - Speech-to-Text
High-quality speech recognition using Deepgram’s Flux model:Deepgram model to use for transcription. See Flux models
Enable eager end-of-turn detection for faster response times
Optional API key. Defaults to
DEEPGRAM_API_KEY environment variableTTS - Text-to-Speech
Low-latency text-to-speech using Deepgram’s Aura model:Deepgram Aura voice model. See Available Voices
Audio sample rate in Hz
Optional API key. Defaults to
DEEPGRAM_API_KEY environment variableAvailable Voices
Deepgram offers various Aura voice models:| Voice | Description | Language |
|---|---|---|
aura-2-thalia-en | Default female voice | English |
aura-2-orion-en | Male voice | English |
aura-2-asteria-en | Female voice | English |
aura-2-perseus-en | Male voice | English |
Usage Example
Combine STT and TTS in a voice agent:Configuration Tips
For Fastest Response
For Best Quality
Features
Speech-to-Text (STT)
- Built-in turn detection
- Real-time streaming transcription
- High accuracy with Flux models
- Automatic punctuation and formatting
Text-to-Speech (TTS)
- Low-latency WebSocket streaming
- Natural-sounding voices
- Multiple voice options
- Configurable sample rates
Environment Variables
Technical Details
STT Implementation
- Uses Deepgram’s real-time streaming API
- WebSocket connection for low latency
- Automatic reconnection handling
- Built-in silence detection for turn taking
TTS Implementation
- WebSocket streaming for fast audio delivery
- Chunked audio output for immediate playback
- Configurable sample rates (8000-48000 Hz)
- Automatic audio format conversion
References
- Deepgram STT Docs
- Deepgram TTS Docs
- Python SDK Examples
- Plugin Source:
plugins/deepgram/vision_agents/plugins/deepgram/__init__.py