SpeechToTextPlayground

The SpeechToTextPlayground component is the main orchestrator for the speech-to-text functionality. It manages the state for API authentication, file handling, transcription options, and results display.

Component Overview

This component integrates the TranscriptionForm and TranscriptionResult components to provide a complete transcription workflow. It handles:

API key management
File selection and validation
Transcription options configuration
ElevenLabs API integration
Result processing and display
Error handling
Speaker identification and naming

State Management

The component uses React hooks to manage the following state:

apiKey

string

User’s ElevenLabs API key for authentication

file

File | null

Selected audio/video file for transcription

isTranscribing

boolean

Loading state during transcription API call

result

TranscriptResult | null

Transcription result containing transcript, audio URL, and alignment data

error

string | null

Error message from failed transcription attempts

speakerNames

SpeakerNames

Record mapping speaker IDs to custom display names

options

TranscriptOptions

Configuration options for the transcription request

Usage Example

import { SpeechToTextPlayground } from "@/features/speech-to-text-playground/speech-to-text-playground";

export default function PlaygroundPage() {
  return (
    <div className="min-h-screen">
      <SpeechToTextPlayground />
    </div>
  );
}

TypeScript Interfaces

TranscriptOptions

Configuration options for the transcription API request:

type TranscriptOptions = {
  modelId: "scribe_v1" | "scribe_v2";
  languageCode?: string;
  tagAudioEvents: boolean;
  numSpeakers?: number;
  timestampsGranularity: "none" | "word" | "character";
  diarize: boolean;
  diarizationThreshold?: number;
  temperature?: number;
  seed?: number;
  useMultiChannel: boolean;
  keyterms?: string[];
  entityDetection?: string;
};

TranscriptResult

Result object containing transcription data:

type TranscriptResult = {
  transcript: SpeechToTextChunkResponseModel;
  audioUrl: string;
  alignment: CharacterAlignmentResponseModel;
};

SpeakerNames

Mapping of speaker IDs to custom names:

type SpeakerNames = Record<string, string>;

Default Configuration

The component initializes with these default transcription options:

const defaultTranscriptOptions: TranscriptOptions = {
  modelId: "scribe_v2",
  tagAudioEvents: false,
  timestampsGranularity: "character",
  diarize: false,
  useMultiChannel: false,
};

Component Lifecycle

1. File Selection

When a user selects a file:

The file state is updated
Any existing results are cleared
Audio type is determined from file extension/MIME type

function handleFileSelected(selectedFile: File | null) {
  setFile(selectedFile);
  setResult(null);
}

2. Transcription Process

When the form is submitted:

async function handleTranscribe(event: Parameters<SubmitEventHandler<HTMLFormElement>>[0]) {
  event.preventDefault();
  if (!file || !apiKey) return;

  setIsTranscribing(true);
  setError(null);
  setResult(null);

  try {
    const browserClient = new ElevenLabsClient({ apiKey });
    const transcriptResponse = await browserClient.speechToText.convert({
      file,
      modelId: options.modelId || "scribe_v2",
      languageCode: options.languageCode || undefined,
      tagAudioEvents: options.tagAudioEvents || false,
      numSpeakers: options.numSpeakers || undefined,
      timestampsGranularity: options.timestampsGranularity || "character",
      diarize: options.diarize || false,
      diarizationThreshold: options.diarizationThreshold || undefined,
      temperature: options.temperature || undefined,
      seed: options.seed || undefined,
      useMultiChannel: options.useMultiChannel || false,
      keyterms: options.keyterms || undefined,
      entityDetection: options.entityDetection || undefined,
    });

    const audioUrl = URL.createObjectURL(file);
    const alignment = convertToAlignment(transcriptResponse);

    setResult({
      transcript: transcriptResponse,
      audioUrl,
      alignment,
    });
  } catch (err: unknown) {
    const apiErrorMessage = getElevenLabsErrorMessage(err);
    const fallbackMessage = err instanceof Error ? err.message : "An error occurred";
    setError(apiErrorMessage ?? fallbackMessage);
  } finally {
    setIsTranscribing(false);
  }
}

3. Speaker Name Management

Users can customize speaker labels:

function handleSpeakerNameChange(speakerId: string, newName: string) {
  setSpeakerNames((prev) => ({
    ...prev,
    [speakerId]: newName,
  }));
}

Component Structure

The component renders two main sections:

return (
  <div className="container mx-auto p-4 md:p-8 max-w-6xl">
    <TranscriptionForm
      apiKey={apiKey}
      file={file}
      options={options}
      isTranscribing={isTranscribing}
      error={error}
      onApiKeyChange={setApiKey}
      onFileSelected={handleFileSelected}
      onOptionsChange={setOptions}
      onSubmit={handleTranscribe}
    />

    {result && (
      <TranscriptionResult
        result={result}
        audioType={audioType}
        speakerNames={speakerNames}
        onSpeakerNameChange={handleSpeakerNameChange}
      />
    )}
  </div>
);

Dependencies

import { useMemo, useState, type SubmitEventHandler } from "react";
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { TranscriptionForm } from "./transcription-form";
import { TranscriptionResult } from "./transcription-result";
import type {
  SpeakerNames,
  TranscriptOptions,
  TranscriptResult,
} from "./speech-to-text-types";
import {
  convertToAlignment,
  getAudioTypeForFile,
  getElevenLabsErrorMessage,
  isSpeechToTextChunkResponseModel,
} from "./transcript-utils";

Source Location

/home/daytona/workspace/source/src/features/speech-to-text-playground/speech-to-text-playground.tsx

Overview

Features

Transcript View

UI Components

Component Overview

State Management

Usage Example

TypeScript Interfaces

TranscriptOptions

TranscriptResult

SpeakerNames

Default Configuration

Component Lifecycle

1. File Selection

2. Transcription Process

3. Speaker Name Management

Component Structure

Dependencies

Source Location

Build docs developers (and LLMs) love

Overview

Features

Transcript View

UI Components

Documentation Index

​Component Overview

​State Management

​Usage Example

​TypeScript Interfaces

​TranscriptOptions

​TranscriptResult

​SpeakerNames

​Default Configuration

​Component Lifecycle

​1. File Selection

​2. Transcription Process

​3. Speaker Name Management

​Component Structure

​Dependencies

​Source Location

Build docs developers (and LLMs) love

Component Overview

State Management

Usage Example

TypeScript Interfaces

TranscriptOptions

TranscriptResult

SpeakerNames

Default Configuration

Component Lifecycle

1. File Selection

2. Transcription Process

3. Speaker Name Management

Component Structure

Dependencies

Source Location