Documentation Index
Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The transcription feature uses the ElevenLabs Scribe API to convert audio and video files into accurate text transcripts. It supports multiple audio formats, language detection, and various configuration options for precision control.
The application accepts the following audio and video formats:
type AudioType =
| "audio/mpeg"
| "audio/wav"
| "audio/ogg"
| "audio/mp3"
| "audio/m4a"
| "audio/aac"
| "audio/webm";
The system automatically detects the audio type from the file extension or MIME type.
Configuration Options
All transcription options are defined in the TranscriptOptions interface:
export type TranscriptOptions = {
modelId: "scribe_v1" | "scribe_v2";
languageCode?: string;
tagAudioEvents: boolean;
numSpeakers?: number;
timestampsGranularity: "none" | "word" | "character";
diarize: boolean;
diarizationThreshold?: number;
temperature?: number;
seed?: number;
useMultiChannel: boolean;
keyterms?: string[];
entityDetection?: string;
};
Key Options
Model Selection
Timestamps
Language
Choose between two transcription models:
- scribe_v1: First generation model
- scribe_v2: Latest model with improved accuracy (default)
Control the granularity of timestamp data:
- none: No timestamps
- word: Word-level timestamps
- character: Character-level timestamps (default)
timestampsGranularity: "character"
Optionally specify a language code to improve accuracy:languageCode: "en" // English
languageCode: "es" // Spanish
languageCode: "fr" // French
Making API Calls
The transcription is performed using the ElevenLabs JavaScript SDK. Here’s the actual implementation from the playground:
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
async function handleTranscribe(file: File, apiKey: string, options: TranscriptOptions) {
const browserClient = new ElevenLabsClient({ apiKey });
const transcriptResponse = await browserClient.speechToText.convert({
file,
modelId: options.modelId || "scribe_v2",
languageCode: options.languageCode || undefined,
tagAudioEvents: options.tagAudioEvents || false,
numSpeakers: options.numSpeakers || undefined,
timestampsGranularity: options.timestampsGranularity || "character",
diarize: options.diarize || false,
diarizationThreshold: options.diarizationThreshold || undefined,
temperature: options.temperature || undefined,
seed: options.seed || undefined,
useMultiChannel: options.useMultiChannel || false,
keyterms: options.keyterms || undefined,
entityDetection: options.entityDetection || undefined,
});
return transcriptResponse;
}
The API call is made from speech-to-text-playground.tsx:48-62. All options are passed directly to the ElevenLabs SDK.
Advanced Options
Temperature & Seed
Control the randomness and reproducibility of transcriptions:
temperature: 0.5 // Range: 0.0-2.0 (lower = more deterministic)
seed: 42 // Fixed seed for reproducible results
Keyterms
Provide domain-specific terms to improve recognition accuracy:
keyterms: ["ElevenLabs", "API", "transcription"]
Keyterms are parsed from comma-separated input:
export function parseKeytermsInput(value: string): string[] | undefined {
const trimmed = value.trim();
if (!trimmed) return undefined;
return trimmed
.split(",")
.map((item) => item.trim())
.filter((item) => item.length > 0);
}
Entity Detection
Detect and optionally redact sensitive information:
entityDetection: "pii" // Personally Identifiable Information
entityDetection: "phi" // Protected Health Information
entityDetection: "all" // All entity types
Audio Event Tagging
Tag non-speech audio events (laughter, applause, etc.):
Response Handling
The API returns a SpeechToTextChunkResponseModel containing:
- Full transcript text
- Word-level data with timestamps
- Speaker information (if diarization enabled)
- Character-level alignment data
const audioUrl = URL.createObjectURL(file);
const alignment = convertToAlignment(transcriptResponse);
setResult({
transcript: transcriptResponse,
audioUrl,
alignment,
});
Error Handling
The application includes specialized error parsing for ElevenLabs API errors:
export function getElevenLabsErrorMessage(error: unknown): string | undefined {
if (!error || typeof error !== "object") return undefined;
const body = Reflect.get(error, "body");
const parsedBody = typeof body === "string" ? safeParseJson(body) : body;
if (!parsedBody || typeof parsedBody !== "object") return undefined;
const detail = Reflect.get(parsedBody, "detail");
if (typeof detail === "string") return detail;
if (detail && typeof detail === "object") {
const detailMessage = Reflect.get(detail, "message");
if (typeof detailMessage === "string") return detailMessage;
return JSON.stringify(detail);
}
return undefined;
}
Always handle API errors gracefully and provide meaningful feedback to users about authentication issues, file format problems, or quota limits.
Default Configuration
The playground uses these defaults:
const defaultTranscriptOptions: TranscriptOptions = {
modelId: "scribe_v2",
tagAudioEvents: false,
timestampsGranularity: "character",
diarize: false,
useMultiChannel: false,
};
Next Steps