Documentation Index Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Speech-to-Text Playground provides comprehensive transcription configuration options. All options are part of the TranscriptOptions interface defined in speech-to-text-types.ts:
export type TranscriptOptions = {
modelId : "scribe_v1" | "scribe_v2" ;
languageCode ?: string ;
tagAudioEvents : boolean ;
numSpeakers ?: number ;
timestampsGranularity : "none" | "word" | "character" ;
diarize : boolean ;
diarizationThreshold ?: number ;
temperature ?: number ;
seed ?: number ;
useMultiChannel : boolean ;
keyterms ?: string [];
entityDetection ?: string ;
};
Default Configuration
The playground uses these default values:
const defaultTranscriptOptions : TranscriptOptions = {
modelId: "scribe_v2" ,
tagAudioEvents: false ,
timestampsGranularity: "character" ,
diarize: false ,
useMultiChannel: false ,
};
Core Options
Model Selection
modelId
'scribe_v1' | 'scribe_v2'
default: "scribe_v2"
required
The Scribe model version to use for transcription.
scribe_v1 : First generation model, stable and reliable
scribe_v2 : Latest model with improved accuracy and features (recommended)
Usage in UI:
< Select value = { options . modelId } onValueChange = { handleModelChange } >
< SelectTrigger id = "model" >
< SelectValue />
</ SelectTrigger >
< SelectContent >
< SelectItem value = "scribe_v1" > Scribe V1 </ SelectItem >
< SelectItem value = "scribe_v2" > Scribe V2 </ SelectItem >
</ SelectContent >
</ Select >
API Call:
await browserClient . speechToText . convert ({
modelId: options . modelId || "scribe_v2" ,
// ... other options
});
Language Code
languageCode
string
default: "undefined"
Optional ISO language code to improve transcription accuracy for specific languages. Examples: "en", "es", "fr", "de", "ja", "zh" When not specified, the model will attempt to auto-detect the language.
Usage in UI:
< Input
id = "language"
placeholder = "e.g., en, es, fr"
value = { options . languageCode || "" }
onChange = { handleLanguageChange }
/>
Implementation:
function handleLanguageChange ( event : ChangeEvent < HTMLInputElement >) {
const value = event . target . value || undefined ;
onOptionsChange ({ ... options , languageCode: value });
}
API Call:
await browserClient . speechToText . convert ({
languageCode: options . languageCode || undefined ,
// ... other options
});
Tag Audio Events
When enabled, the transcript will include tags for non-speech audio events such as laughter, applause, music, or background noise.
Usage in UI:
< Checkbox
id = "tagAudio"
checked = { options . tagAudioEvents }
onCheckedChange = { handleTagAudioChange }
/>
< Label htmlFor = "tagAudio" > Tag Audio Events </ Label >
API Call:
await browserClient . speechToText . convert ({
tagAudioEvents: options . tagAudioEvents || false ,
// ... other options
});
Timestamp Options
Timestamps Granularity
timestampsGranularity
'none' | 'word' | 'character'
default: "character"
required
Controls the level of detail for timestamp information in the transcription.
none : No timestamps included
word : Timestamps for each word
character : Timestamps for each character (most detailed)
Usage in UI:
< Select
value = { options . timestampsGranularity }
onValueChange = { handleTimestampsChange }
>
< SelectTrigger id = "timestamps" >
< SelectValue />
</ SelectTrigger >
< SelectContent >
< SelectItem value = "none" > None </ SelectItem >
< SelectItem value = "word" > Word </ SelectItem >
< SelectItem value = "character" > Character </ SelectItem >
</ SelectContent >
</ Select >
API Call:
await browserClient . speechToText . convert ({
timestampsGranularity: options . timestampsGranularity || "character" ,
// ... other options
});
Character-level timestamps enable precise synchronization with audio playback and detailed alignment visualization in the transcript viewer.
Speaker Detection (Diarization)
Diarize
Enable speaker diarization to identify and separate different speakers in the audio. When enabled, the transcript will include speaker labels (e.g., Speaker 1, Speaker 2) to distinguish between different voices.
Usage in UI:
< Checkbox
id = "diarize"
checked = { options . diarize }
onCheckedChange = { handleDiarizeChange }
/>
< Label htmlFor = "diarize" > Diarize (Speaker Detection) </ Label >
API Call:
await browserClient . speechToText . convert ({
diarize: options . diarize || false ,
// ... other options
});
Number of Speakers
numSpeakers
number
default: "undefined"
Specify the expected number of speakers in the audio (1-32). When not specified, the model will attempt to auto-detect the number of speakers. Providing an accurate count can improve diarization accuracy.
Usage in UI:
< Input
id = "speakers"
type = "number"
min = "1"
max = "32"
placeholder = "Auto-detect"
value = { options . numSpeakers || "" }
onChange = { handleNumSpeakersChange }
/>
Implementation:
function handleNumSpeakersChange ( event : ChangeEvent < HTMLInputElement >) {
const value = event . target . value ;
const numSpeakers = value ? parseInt ( value , 10 ) : undefined ;
onOptionsChange ({ ... options , numSpeakers });
}
API Call:
await browserClient . speechToText . convert ({
numSpeakers: options . numSpeakers || undefined ,
// ... other options
});
Diarization Threshold
diarizationThreshold
number
default: "undefined"
Fine-tune the sensitivity of speaker detection (0.0-1.0).
Lower values (closer to 0): More sensitive, may create more speaker segments
Higher values (closer to 1): Less sensitive, may merge speakers together
Only applies when diarize is true and numSpeakers is not specified.
Usage in UI:
{ options . diarize && ! options . numSpeakers && (
< div className = "space-y-2" >
< Label htmlFor = "diarization-threshold" >
Diarization Threshold (0.0-1.0)
</ Label >
< Input
id = "diarization-threshold"
type = "number"
step = "0.01"
min = "0"
max = "1"
placeholder = "Auto"
value = { options . diarizationThreshold || "" }
onChange = { handleDiarizationThresholdChange }
/>
</ div >
)}
Implementation:
function handleDiarizationThresholdChange ( event : ChangeEvent < HTMLInputElement >) {
const value = event . target . value ;
const diarizationThreshold = value ? parseFloat ( value ) : undefined ;
onOptionsChange ({ ... options , diarizationThreshold });
}
API Call:
await browserClient . speechToText . convert ({
diarizationThreshold: options . diarizationThreshold || undefined ,
// ... other options
});
The diarization threshold field only appears in the UI when diarization is enabled and the number of speakers is not explicitly set.
Multi-Channel Audio
Use Multi-Channel
Enable multi-channel processing for audio files with multiple channels (e.g., stereo recordings where each speaker is on a separate channel). When enabled, each audio channel is processed separately, which can improve accuracy for multi-channel recordings.
Usage in UI:
< Checkbox
id = "multichannel"
checked = { options . useMultiChannel }
onCheckedChange = { handleMultiChannelChange }
/>
< Label htmlFor = "multichannel" > Multi-channel Audio </ Label >
API Call:
await browserClient . speechToText . convert ({
useMultiChannel: options . useMultiChannel || false ,
// ... other options
});
Use multi-channel processing when you have recordings where each speaker is isolated to a specific audio channel, such as professional podcast recordings or call center recordings.
Common Configurations
Recommended settings for podcast transcription: {
modelId : "scribe_v2" ,
languageCode : "en" ,
tagAudioEvents : true ,
numSpeakers : 2 , // or the actual number of hosts
timestampsGranularity : "word" ,
diarize : true ,
useMultiChannel : true // if each host is on a separate channel
}
Recommended settings for interview transcription: {
modelId : "scribe_v2" ,
languageCode : "en" ,
tagAudioEvents : false ,
numSpeakers : 2 ,
timestampsGranularity : "character" ,
diarize : true ,
useMultiChannel : false
}
Recommended settings for meeting transcription: {
modelId : "scribe_v2" ,
languageCode : "en" ,
tagAudioEvents : true ,
// numSpeakers: undefined (auto-detect)
timestampsGranularity : "word" ,
diarize : true ,
diarizationThreshold : 0.5 ,
useMultiChannel : false
}
Quick Transcription (No Speaker Info)
Fastest transcription without speaker detection: {
modelId : "scribe_v2" ,
tagAudioEvents : false ,
timestampsGranularity : "none" ,
diarize : false ,
useMultiChannel : false
}
Next Steps
Advanced Settings Configure keyterms, entity detection, temperature, and seed
Using the Transcript Learn how to view and interact with your transcriptions