Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Speech-to-Text Playground provides comprehensive transcription configuration options. All options are part of the TranscriptOptions interface defined in speech-to-text-types.ts:
export type TranscriptOptions = {
  modelId: "scribe_v1" | "scribe_v2";
  languageCode?: string;
  tagAudioEvents: boolean;
  numSpeakers?: number;
  timestampsGranularity: "none" | "word" | "character";
  diarize: boolean;
  diarizationThreshold?: number;
  temperature?: number;
  seed?: number;
  useMultiChannel: boolean;
  keyterms?: string[];
  entityDetection?: string;
};

Default Configuration

The playground uses these default values:
const defaultTranscriptOptions: TranscriptOptions = {
  modelId: "scribe_v2",
  tagAudioEvents: false,
  timestampsGranularity: "character",
  diarize: false,
  useMultiChannel: false,
};

Core Options

Model Selection

modelId
'scribe_v1' | 'scribe_v2'
default:"scribe_v2"
required
The Scribe model version to use for transcription.
  • scribe_v1: First generation model, stable and reliable
  • scribe_v2: Latest model with improved accuracy and features (recommended)
Usage in UI:
<Select value={options.modelId} onValueChange={handleModelChange}>
  <SelectTrigger id="model">
    <SelectValue />
  </SelectTrigger>
  <SelectContent>
    <SelectItem value="scribe_v1">Scribe V1</SelectItem>
    <SelectItem value="scribe_v2">Scribe V2</SelectItem>
  </SelectContent>
</Select>
API Call:
await browserClient.speechToText.convert({
  modelId: options.modelId || "scribe_v2",
  // ... other options
});

Language Code

languageCode
string
default:"undefined"
Optional ISO language code to improve transcription accuracy for specific languages.Examples: "en", "es", "fr", "de", "ja", "zh"When not specified, the model will attempt to auto-detect the language.
Usage in UI:
<Input
  id="language"
  placeholder="e.g., en, es, fr"
  value={options.languageCode || ""}
  onChange={handleLanguageChange}
/>
Implementation:
function handleLanguageChange(event: ChangeEvent<HTMLInputElement>) {
  const value = event.target.value || undefined;
  onOptionsChange({ ...options, languageCode: value });
}
API Call:
await browserClient.speechToText.convert({
  languageCode: options.languageCode || undefined,
  // ... other options
});

Tag Audio Events

tagAudioEvents
boolean
default:"false"
When enabled, the transcript will include tags for non-speech audio events such as laughter, applause, music, or background noise.
Usage in UI:
<Checkbox
  id="tagAudio"
  checked={options.tagAudioEvents}
  onCheckedChange={handleTagAudioChange}
/>
<Label htmlFor="tagAudio">Tag Audio Events</Label>
API Call:
await browserClient.speechToText.convert({
  tagAudioEvents: options.tagAudioEvents || false,
  // ... other options
});

Timestamp Options

Timestamps Granularity

timestampsGranularity
'none' | 'word' | 'character'
default:"character"
required
Controls the level of detail for timestamp information in the transcription.
  • none: No timestamps included
  • word: Timestamps for each word
  • character: Timestamps for each character (most detailed)
Usage in UI:
<Select
  value={options.timestampsGranularity}
  onValueChange={handleTimestampsChange}
>
  <SelectTrigger id="timestamps">
    <SelectValue />
  </SelectTrigger>
  <SelectContent>
    <SelectItem value="none">None</SelectItem>
    <SelectItem value="word">Word</SelectItem>
    <SelectItem value="character">Character</SelectItem>
  </SelectContent>
</Select>
API Call:
await browserClient.speechToText.convert({
  timestampsGranularity: options.timestampsGranularity || "character",
  // ... other options
});
Character-level timestamps enable precise synchronization with audio playback and detailed alignment visualization in the transcript viewer.

Speaker Detection (Diarization)

Diarize

diarize
boolean
default:"false"
Enable speaker diarization to identify and separate different speakers in the audio.When enabled, the transcript will include speaker labels (e.g., Speaker 1, Speaker 2) to distinguish between different voices.
Usage in UI:
<Checkbox
  id="diarize"
  checked={options.diarize}
  onCheckedChange={handleDiarizeChange}
/>
<Label htmlFor="diarize">Diarize (Speaker Detection)</Label>
API Call:
await browserClient.speechToText.convert({
  diarize: options.diarize || false,
  // ... other options
});

Number of Speakers

numSpeakers
number
default:"undefined"
Specify the expected number of speakers in the audio (1-32).When not specified, the model will attempt to auto-detect the number of speakers.Providing an accurate count can improve diarization accuracy.
Usage in UI:
<Input
  id="speakers"
  type="number"
  min="1"
  max="32"
  placeholder="Auto-detect"
  value={options.numSpeakers || ""}
  onChange={handleNumSpeakersChange}
/>
Implementation:
function handleNumSpeakersChange(event: ChangeEvent<HTMLInputElement>) {
  const value = event.target.value;
  const numSpeakers = value ? parseInt(value, 10) : undefined;
  onOptionsChange({ ...options, numSpeakers });
}
API Call:
await browserClient.speechToText.convert({
  numSpeakers: options.numSpeakers || undefined,
  // ... other options
});

Diarization Threshold

diarizationThreshold
number
default:"undefined"
Fine-tune the sensitivity of speaker detection (0.0-1.0).
  • Lower values (closer to 0): More sensitive, may create more speaker segments
  • Higher values (closer to 1): Less sensitive, may merge speakers together
Only applies when diarize is true and numSpeakers is not specified.
Usage in UI:
{options.diarize && !options.numSpeakers && (
  <div className="space-y-2">
    <Label htmlFor="diarization-threshold">
      Diarization Threshold (0.0-1.0)
    </Label>
    <Input
      id="diarization-threshold"
      type="number"
      step="0.01"
      min="0"
      max="1"
      placeholder="Auto"
      value={options.diarizationThreshold || ""}
      onChange={handleDiarizationThresholdChange}
    />
  </div>
)}
Implementation:
function handleDiarizationThresholdChange(event: ChangeEvent<HTMLInputElement>) {
  const value = event.target.value;
  const diarizationThreshold = value ? parseFloat(value) : undefined;
  onOptionsChange({ ...options, diarizationThreshold });
}
API Call:
await browserClient.speechToText.convert({
  diarizationThreshold: options.diarizationThreshold || undefined,
  // ... other options
});
The diarization threshold field only appears in the UI when diarization is enabled and the number of speakers is not explicitly set.

Multi-Channel Audio

Use Multi-Channel

useMultiChannel
boolean
default:"false"
Enable multi-channel processing for audio files with multiple channels (e.g., stereo recordings where each speaker is on a separate channel).When enabled, each audio channel is processed separately, which can improve accuracy for multi-channel recordings.
Usage in UI:
<Checkbox
  id="multichannel"
  checked={options.useMultiChannel}
  onCheckedChange={handleMultiChannelChange}
/>
<Label htmlFor="multichannel">Multi-channel Audio</Label>
API Call:
await browserClient.speechToText.convert({
  useMultiChannel: options.useMultiChannel || false,
  // ... other options
});
Use multi-channel processing when you have recordings where each speaker is isolated to a specific audio channel, such as professional podcast recordings or call center recordings.

Common Configurations

Recommended settings for podcast transcription:
{
  modelId: "scribe_v2",
  languageCode: "en",
  tagAudioEvents: true,
  numSpeakers: 2, // or the actual number of hosts
  timestampsGranularity: "word",
  diarize: true,
  useMultiChannel: true // if each host is on a separate channel
}
Recommended settings for interview transcription:
{
  modelId: "scribe_v2",
  languageCode: "en",
  tagAudioEvents: false,
  numSpeakers: 2,
  timestampsGranularity: "character",
  diarize: true,
  useMultiChannel: false
}
Recommended settings for meeting transcription:
{
  modelId: "scribe_v2",
  languageCode: "en",
  tagAudioEvents: true,
  // numSpeakers: undefined (auto-detect)
  timestampsGranularity: "word",
  diarize: true,
  diarizationThreshold: 0.5,
  useMultiChannel: false
}
Fastest transcription without speaker detection:
{
  modelId: "scribe_v2",
  tagAudioEvents: false,
  timestampsGranularity: "none",
  diarize: false,
  useMultiChannel: false
}

Next Steps

Advanced Settings

Configure keyterms, entity detection, temperature, and seed

Using the Transcript

Learn how to view and interact with your transcriptions

Build docs developers (and LLMs) love