The Transcript Viewer provides a complete system for displaying synchronized transcripts with audio playback. It uses a context provider pattern to share state between components.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The transcript viewer system consists of:- TranscriptViewerContainer - Root component that manages state and provides context
- TranscriptViewerWords - Renders the transcript with word-level highlighting
- TranscriptViewerAudio - Audio element with synchronized playback
- TranscriptViewerPlayPauseButton - Play/pause control button
- TranscriptViewerScrubBar - Timeline scrubber for navigation
useTranscriptViewerContext hook internally to access shared state.
Basic Usage
TranscriptViewerContainer
The root component that initializes the transcript viewer state and provides context to child components.Props
URL or path to the audio file
MIME type of the audio file. Supported types:
"audio/mpeg""audio/wav""audio/ogg""audio/mp3""audio/m4a""audio/aac""audio/webm"
Character-level alignment data from ElevenLabs Speech-to-Text API. Contains:
characters- Array of individual characterscharacterStartTimesSeconds- Start time for each charactercharacterEndTimesSeconds- End time for each character
Custom function to compose transcript segments from alignment data. If not provided, uses the default
composeSegments function.Whether to hide audio tags (e.g.,
[music], [applause]) from the transcript displayCallback invoked when audio playback starts
Callback invoked when audio playback pauses
Callback invoked when audio time updates. Receives the current time in seconds.
Callback invoked when audio playback ends
Callback invoked when audio duration is loaded or changes. Receives the duration in seconds.
Additional CSS classes for the container div
Example with Callbacks
TranscriptViewerWords
Renders the transcript text with automatic word-level highlighting synchronized to audio playback.Props
Custom render function for individual words. Receives:
Custom render function for gaps (whitespace/punctuation). Receives:
Additional CSS classes for word spans
Additional CSS classes for gap spans
Additional CSS classes for the container div
Default Styling
By default, words are styled based on their status:- Spoken -
text-muted-foreground(already played) - Current -
text-primary font-semibold(currently playing) - Unspoken -
text-foreground(not yet played)
Custom Word Rendering
Word Status Logic
The component automatically determines word status based on:spokenSegments- All segments before the current wordcurrentWord- The word currently being spokenunspokenSegments- All segments after the current word
currentTime >= duration - 0.01), all words are marked as “spoken”.
TranscriptViewerAudio
Renders the HTML5 audio element connected to the transcript viewer state.Props
Accepts all standard HTML<audio> element props except children and src, which are managed internally.
Example
The audio element is automatically connected to the viewer state. You don’t need to manually manage refs or event listeners.
TranscriptViewerPlayPauseButton
A button that toggles audio playback state.Props
Button content. Can be:
- A static ReactNode
- A render function that receives
{ isPlaying: boolean }
lucide-reactButton component except onClick is augmented.
Examples
TranscriptViewerScrubBar
An interactive timeline scrubber for navigating through the audio.Props
Whether to display current time and duration labels below the scrub bar
CSS classes for the time labels container
CSS classes for the scrub bar track
CSS classes for the progress indicator
CSS classes for the draggable thumb
CSS classes for the scrub bar container
Example
Behavior
- Click - Seek to clicked position
- Drag - Scrub through audio timeline
- During scrubbing - Animation frame updates are paused for smooth interaction
- After scrubbing - Animation frame updates resume if audio is playing
Context Provider Pattern
The transcript viewer uses React Context to share state between components:Context Value
The context provides:All transcript segments (words and gaps)
Only the word segments (excludes gaps)
Segments that have been played
Segments that haven’t been played yet
The word currently being spoken
Index of the current segment (-1 if none)
Index of the current word (-1 if none)
Whether audio is currently playing
Whether user is actively scrubbing the timeline
Total audio duration in seconds
Current playback time in seconds
Reference to the audio element
Props to spread onto the audio element
Start audio playback
Pause audio playback
Seek to a specific time in seconds
Seek to the start of a specific word (by index or word object)
Called when user starts scrubbing
Called when user stops scrubbing
Complete Example
See Also
- useTranscriptViewer Hook - Lower-level hook for custom implementations
- Word Alignment - Understanding the alignment system