Configuration
Speech-to-Text
ElevenLabs provides speech-to-text through their Scribe model with support for diarization and audio event tagging.Basic Usage
Provider-Specific Options
Language Detection
Specify the language code for better transcription accuracy:Speaker Diarization
ElevenLabs can identify and separate different speakers in the audio:Audio Event Tagging
Detect non-speech audio events like laughter, applause, or background noise:Use Cases
Meeting Transcription with Speaker Identification
Podcast Transcription
Interview Transcription
Audio File Handling
Supported Formats
ElevenLabs Scribe supports various audio formats:Features
- ✅ Speech-to-Text with high accuracy
- ✅ Speaker Diarization (identify multiple speakers)
- ✅ Audio Event Tagging (detect non-speech sounds)
- ✅ Multi-language support
- ❌ Text-to-Speech (not yet implemented)
Best Practices
For Best Diarization Results
- Ensure clear audio quality
- Minimize background noise
- Specify the correct number of speakers
- Use a sample rate of at least 16kHz
For Accurate Transcription
- Use the correct language code
- Ensure good audio quality (clear speech, minimal noise)
- Use appropriate audio format (WAV or high-quality MP3)
- For long recordings, consider splitting into segments