Components Overview - ElevenLabs Speech-to-Text API UI

Architecture Overview

The ElevenLabs Speech-to-Text API UI is built with a modular React architecture using TypeScript. The application follows a feature-based structure with reusable UI components powered by shadcn/ui.

Component Hierarchy

The application has a clear component hierarchy starting from the root:

App.tsx
└── SpeechToTextPlayground
    ├── TranscriptionForm
    │   ├── Card (UI)
    │   ├── Input (UI)
    │   ├── Select (UI)
    │   ├── Checkbox (UI)
    │   ├── Textarea (UI)
    │   └── Button (UI)
    └── TranscriptionResult
        ├── Card (UI)
        ├── Select (UI)
        ├── Input (UI)
        └── TranscriptViewerContainer
            ├── TranscriptViewerAudio
            ├── TranscriptViewerWords
            ├── TranscriptViewerScrubBar
            └── TranscriptViewerPlayPauseButton

Directory Structure

The source code is organized into two main feature directories:

Speech-to-Text Playground Feature

Location: src/features/speech-to-text-playground/

speech-to-text-playground.tsx - Main container component that orchestrates the transcription workflow
transcription-form.tsx - Form component for API key, file upload, and configuration options
transcription-result.tsx - Result display component with interactive transcript viewer
speech-to-text-types.ts - TypeScript type definitions for the feature
transcript-utils.ts - Utility functions for transcript processing and formatting

Transcript View Feature

Location: src/features/transcript-view/

transcript-viewer.tsx - Context-based transcript viewer component system
use-transcript-viewer.ts - Custom hook for transcript playback state management
segment-composer.ts - Logic for composing transcript segments
word-index.ts - Word indexing utilities for alignment
transcript-types.ts - TypeScript type definitions for transcript data

UI Components

Location: src/components/ui/ Reusable UI components based on shadcn/ui:

button.tsx - Button component with variants
card.tsx - Card container with header, content, and description
checkbox.tsx - Checkbox input component
input.tsx - Text input component
label.tsx - Form label component
select.tsx - Dropdown select component
textarea.tsx - Multi-line text input
progress.tsx - Progress bar component
skeleton.tsx - Loading skeleton component
scrub-bar.tsx - Custom audio scrubbing component
transcript-viewer.tsx - Duplicate transcript viewer (also in features/transcript-view)

Major Components

SpeechToTextPlayground

Main container managing state and orchestrating the transcription workflow

TranscriptionForm

Form for API authentication, file upload, and transcription configuration

TranscriptionResult

Displays transcription results with interactive playback controls

TranscriptViewer

Context-based viewer system for synchronized transcript playback

Scrub Bar

Audio scrubbing component for timeline navigation

Form Controls

Reusable form UI components based on shadcn/ui

Component Composition

The application uses a composition pattern where:

App.tsx serves as the entry point and renders the main playground
SpeechToTextPlayground acts as the state container, managing:
- API key authentication
- File selection and validation
- Transcription options and configuration
- Transcription API calls via ElevenLabs SDK
- Result state and error handling
TranscriptionForm handles all user inputs:
- API key (password input)
- Audio/video file upload with validation
- Model selection (Scribe V1/V2)
- Language code and timestamps granularity
- Speaker diarization settings
- Advanced options (temperature, seed, keyterms, etc.)
TranscriptionResult displays the output:
- Language detection with confidence score
- Speaker name customization
- Multiple copy formats (plain text, markdown variants)
- Interactive transcript viewer with audio synchronization
TranscriptViewer provides playback features:
- Word-level highlighting synchronized with audio
- Scrubbing controls for navigation
- Play/pause functionality
- Segment composition for audio tags and gaps

Data Flow

The application follows a unidirectional data flow:

User inputs are captured in TranscriptionForm
Form submission triggers the API call in SpeechToTextPlayground
API response is processed using transcript-utils.ts helpers
Converted data is passed to TranscriptionResult
TranscriptViewerContainer provides context to child viewer components
User interactions (play, pause, scrub) update the viewer state via hooks

Key Features

Type Safety: Full TypeScript coverage with strict type definitions
State Management: React hooks for local state, context for shared viewer state
Error Handling: ElevenLabs API error parsing and user-friendly messages
Audio Sync: Character-level alignment for precise word highlighting
Speaker Diarization: Support for multi-speaker detection and labeling
Responsive Design: Mobile-friendly layout with Tailwind CSS
Accessibility: Semantic HTML and ARIA-compliant components

Next Steps

Explore detailed documentation for each component:

Learn how SpeechToTextPlayground orchestrates the workflow
Understand TranscriptionForm configuration options
Discover TranscriptViewer implementation details
Review UI Components for design system usage

Overview

Features

Transcript View

UI Components

Documentation Index

​Architecture Overview

​Component Hierarchy

​Directory Structure

​Speech-to-Text Playground Feature

​Transcript View Feature

​UI Components

​Major Components

SpeechToTextPlayground

TranscriptionForm

TranscriptionResult

TranscriptViewer

Scrub Bar

Form Controls

​Component Composition

​Data Flow

​Key Features

​Next Steps

Build docs developers (and LLMs) love

Architecture Overview

Component Hierarchy

Directory Structure

Speech-to-Text Playground Feature

Transcript View Feature

UI Components

Major Components

Component Composition

Data Flow

Key Features

Next Steps