Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt

Use this file to discover all available pages before exploring further.

The SpeechToTextPlayground component is the main orchestrator for the speech-to-text functionality. It manages the state for API authentication, file handling, transcription options, and results display.

Component Overview

This component integrates the TranscriptionForm and TranscriptionResult components to provide a complete transcription workflow. It handles:
  • API key management
  • File selection and validation
  • Transcription options configuration
  • ElevenLabs API integration
  • Result processing and display
  • Error handling
  • Speaker identification and naming

State Management

The component uses React hooks to manage the following state:
apiKey
string
User’s ElevenLabs API key for authentication
file
File | null
Selected audio/video file for transcription
isTranscribing
boolean
Loading state during transcription API call
result
TranscriptResult | null
Transcription result containing transcript, audio URL, and alignment data
error
string | null
Error message from failed transcription attempts
speakerNames
SpeakerNames
Record mapping speaker IDs to custom display names
options
TranscriptOptions
Configuration options for the transcription request

Usage Example

import { SpeechToTextPlayground } from "@/features/speech-to-text-playground/speech-to-text-playground";

export default function PlaygroundPage() {
  return (
    <div className="min-h-screen">
      <SpeechToTextPlayground />
    </div>
  );
}

TypeScript Interfaces

TranscriptOptions

Configuration options for the transcription API request:
type TranscriptOptions = {
  modelId: "scribe_v1" | "scribe_v2";
  languageCode?: string;
  tagAudioEvents: boolean;
  numSpeakers?: number;
  timestampsGranularity: "none" | "word" | "character";
  diarize: boolean;
  diarizationThreshold?: number;
  temperature?: number;
  seed?: number;
  useMultiChannel: boolean;
  keyterms?: string[];
  entityDetection?: string;
};

TranscriptResult

Result object containing transcription data:
type TranscriptResult = {
  transcript: SpeechToTextChunkResponseModel;
  audioUrl: string;
  alignment: CharacterAlignmentResponseModel;
};

SpeakerNames

Mapping of speaker IDs to custom names:
type SpeakerNames = Record<string, string>;

Default Configuration

The component initializes with these default transcription options:
const defaultTranscriptOptions: TranscriptOptions = {
  modelId: "scribe_v2",
  tagAudioEvents: false,
  timestampsGranularity: "character",
  diarize: false,
  useMultiChannel: false,
};

Component Lifecycle

1. File Selection

When a user selects a file:
  • The file state is updated
  • Any existing results are cleared
  • Audio type is determined from file extension/MIME type
function handleFileSelected(selectedFile: File | null) {
  setFile(selectedFile);
  setResult(null);
}

2. Transcription Process

When the form is submitted:
async function handleTranscribe(event: Parameters<SubmitEventHandler<HTMLFormElement>>[0]) {
  event.preventDefault();
  if (!file || !apiKey) return;

  setIsTranscribing(true);
  setError(null);
  setResult(null);

  try {
    const browserClient = new ElevenLabsClient({ apiKey });
    const transcriptResponse = await browserClient.speechToText.convert({
      file,
      modelId: options.modelId || "scribe_v2",
      languageCode: options.languageCode || undefined,
      tagAudioEvents: options.tagAudioEvents || false,
      numSpeakers: options.numSpeakers || undefined,
      timestampsGranularity: options.timestampsGranularity || "character",
      diarize: options.diarize || false,
      diarizationThreshold: options.diarizationThreshold || undefined,
      temperature: options.temperature || undefined,
      seed: options.seed || undefined,
      useMultiChannel: options.useMultiChannel || false,
      keyterms: options.keyterms || undefined,
      entityDetection: options.entityDetection || undefined,
    });

    const audioUrl = URL.createObjectURL(file);
    const alignment = convertToAlignment(transcriptResponse);

    setResult({
      transcript: transcriptResponse,
      audioUrl,
      alignment,
    });
  } catch (err: unknown) {
    const apiErrorMessage = getElevenLabsErrorMessage(err);
    const fallbackMessage = err instanceof Error ? err.message : "An error occurred";
    setError(apiErrorMessage ?? fallbackMessage);
  } finally {
    setIsTranscribing(false);
  }
}

3. Speaker Name Management

Users can customize speaker labels:
function handleSpeakerNameChange(speakerId: string, newName: string) {
  setSpeakerNames((prev) => ({
    ...prev,
    [speakerId]: newName,
  }));
}

Component Structure

The component renders two main sections:
return (
  <div className="container mx-auto p-4 md:p-8 max-w-6xl">
    <TranscriptionForm
      apiKey={apiKey}
      file={file}
      options={options}
      isTranscribing={isTranscribing}
      error={error}
      onApiKeyChange={setApiKey}
      onFileSelected={handleFileSelected}
      onOptionsChange={setOptions}
      onSubmit={handleTranscribe}
    />

    {result && (
      <TranscriptionResult
        result={result}
        audioType={audioType}
        speakerNames={speakerNames}
        onSpeakerNameChange={handleSpeakerNameChange}
      />
    )}
  </div>
);

Dependencies

import { useMemo, useState, type SubmitEventHandler } from "react";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { TranscriptionForm } from "./transcription-form";
import { TranscriptionResult } from "./transcription-result";
import type {
  SpeakerNames,
  TranscriptOptions,
  TranscriptResult,
} from "./speech-to-text-types";
import {
  convertToAlignment,
  getAudioTypeForFile,
  getElevenLabsErrorMessage,
  isSpeechToTextChunkResponseModel,
} from "./transcript-utils";

Source Location

/home/daytona/workspace/source/src/features/speech-to-text-playground/speech-to-text-playground.tsx

Build docs developers (and LLMs) love