How It Works
The transcription pipeline uses a WebSocket connection to stream audio directly from the user’s microphone to Deepgram’s API:- Token Generation: A temporary Deepgram access token is minted server-side (5-minute TTL) for secure client connections
- Audio Capture: Browser MediaRecorder API captures audio with echo cancellation and noise suppression
- WebSocket Streaming: Audio chunks (250ms intervals) are sent to Deepgram via WebSocket
- Real-time Processing: Deepgram returns both interim and final transcripts with speaker labels
- Database Storage: Final transcripts are saved to Convex with timestamps and speaker attribution
The system uses utterance end detection (1.5 second pause) to trigger claim extraction automatically when speakers finish statements.
Speaker Diarization
Deepgram’s diarization feature (diarize: true) automatically distinguishes between speakers without requiring voice training:
speaker field (0 or 1), allowing Stanzo to attribute claims to the correct debate participant. The first word’s speaker label determines the speaker for the entire transcript chunk.
Transcript Storage
Final transcripts are stored with rich metadata:- Precise timeline reconstruction
- Speaker-specific filtering
- Context-aware claim extraction
Connection Management
The WebSocket connection requires active maintenance: Keep-Alive: A ping is sent every 8 seconds to prevent connection timeout:- Clear keep-alive interval
- Stop MediaRecorder
- Close WebSocket connection
- Release microphone access
Interim Results
While Deepgram processes audio, interim (non-final) transcripts provide immediate visual feedback:Error Handling
The system monitors connection health and provides user-facing error messages:- Connection Errors: Network issues or API failures
- Microphone Access: Permission denied or device unavailable
- Token Expiration: Automatic detection when 5-minute token expires
All audio processing happens in real-time with no server-side recording. Audio is streamed directly from the browser to Deepgram’s API.
Implementation Reference
Key files:src/hooks/useDeepgram.ts:33-126- WebSocket connection and audio streaming logicconvex/deepgramToken.ts:5-31- Secure token minting with TTLconvex/transcriptChunks.ts:21-36- Database persistence