Documentation Index
Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt
Use this file to discover all available pages before exploring further.
The SpeechToTextPlayground component is the main orchestrator for the speech-to-text functionality. It manages the state for API authentication, file handling, transcription options, and results display.
Component Overview
This component integrates the TranscriptionForm and TranscriptionResult components to provide a complete transcription workflow. It handles:
- API key management
- File selection and validation
- Transcription options configuration
- ElevenLabs API integration
- Result processing and display
- Error handling
- Speaker identification and naming
State Management
The component uses React hooks to manage the following state:
User’s ElevenLabs API key for authentication
Selected audio/video file for transcription
Loading state during transcription API call
Transcription result containing transcript, audio URL, and alignment data
Error message from failed transcription attempts
Record mapping speaker IDs to custom display names
Configuration options for the transcription request
Usage Example
import { SpeechToTextPlayground } from "@/features/speech-to-text-playground/speech-to-text-playground";
export default function PlaygroundPage() {
return (
<div className="min-h-screen">
<SpeechToTextPlayground />
</div>
);
}
TypeScript Interfaces
TranscriptOptions
Configuration options for the transcription API request:
type TranscriptOptions = {
modelId: "scribe_v1" | "scribe_v2";
languageCode?: string;
tagAudioEvents: boolean;
numSpeakers?: number;
timestampsGranularity: "none" | "word" | "character";
diarize: boolean;
diarizationThreshold?: number;
temperature?: number;
seed?: number;
useMultiChannel: boolean;
keyterms?: string[];
entityDetection?: string;
};
TranscriptResult
Result object containing transcription data:
type TranscriptResult = {
transcript: SpeechToTextChunkResponseModel;
audioUrl: string;
alignment: CharacterAlignmentResponseModel;
};
SpeakerNames
Mapping of speaker IDs to custom names:
type SpeakerNames = Record<string, string>;
Default Configuration
The component initializes with these default transcription options:
const defaultTranscriptOptions: TranscriptOptions = {
modelId: "scribe_v2",
tagAudioEvents: false,
timestampsGranularity: "character",
diarize: false,
useMultiChannel: false,
};
Component Lifecycle
1. File Selection
When a user selects a file:
- The file state is updated
- Any existing results are cleared
- Audio type is determined from file extension/MIME type
function handleFileSelected(selectedFile: File | null) {
setFile(selectedFile);
setResult(null);
}
2. Transcription Process
When the form is submitted:
async function handleTranscribe(event: Parameters<SubmitEventHandler<HTMLFormElement>>[0]) {
event.preventDefault();
if (!file || !apiKey) return;
setIsTranscribing(true);
setError(null);
setResult(null);
try {
const browserClient = new ElevenLabsClient({ apiKey });
const transcriptResponse = await browserClient.speechToText.convert({
file,
modelId: options.modelId || "scribe_v2",
languageCode: options.languageCode || undefined,
tagAudioEvents: options.tagAudioEvents || false,
numSpeakers: options.numSpeakers || undefined,
timestampsGranularity: options.timestampsGranularity || "character",
diarize: options.diarize || false,
diarizationThreshold: options.diarizationThreshold || undefined,
temperature: options.temperature || undefined,
seed: options.seed || undefined,
useMultiChannel: options.useMultiChannel || false,
keyterms: options.keyterms || undefined,
entityDetection: options.entityDetection || undefined,
});
const audioUrl = URL.createObjectURL(file);
const alignment = convertToAlignment(transcriptResponse);
setResult({
transcript: transcriptResponse,
audioUrl,
alignment,
});
} catch (err: unknown) {
const apiErrorMessage = getElevenLabsErrorMessage(err);
const fallbackMessage = err instanceof Error ? err.message : "An error occurred";
setError(apiErrorMessage ?? fallbackMessage);
} finally {
setIsTranscribing(false);
}
}
3. Speaker Name Management
Users can customize speaker labels:
function handleSpeakerNameChange(speakerId: string, newName: string) {
setSpeakerNames((prev) => ({
...prev,
[speakerId]: newName,
}));
}
Component Structure
The component renders two main sections:
return (
<div className="container mx-auto p-4 md:p-8 max-w-6xl">
<TranscriptionForm
apiKey={apiKey}
file={file}
options={options}
isTranscribing={isTranscribing}
error={error}
onApiKeyChange={setApiKey}
onFileSelected={handleFileSelected}
onOptionsChange={setOptions}
onSubmit={handleTranscribe}
/>
{result && (
<TranscriptionResult
result={result}
audioType={audioType}
speakerNames={speakerNames}
onSpeakerNameChange={handleSpeakerNameChange}
/>
)}
</div>
);
Dependencies
import { useMemo, useState, type SubmitEventHandler } from "react";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { TranscriptionForm } from "./transcription-form";
import { TranscriptionResult } from "./transcription-result";
import type {
SpeakerNames,
TranscriptOptions,
TranscriptResult,
} from "./speech-to-text-types";
import {
convertToAlignment,
getAudioTypeForFile,
getElevenLabsErrorMessage,
isSpeechToTextChunkResponseModel,
} from "./transcript-utils";
Source Location
/home/daytona/workspace/source/src/features/speech-to-text-playground/speech-to-text-playground.tsx