Documentation Index
Fetch the complete documentation index at: https://mintlify.com/konhi/elevenlabs-speech-to-text-api-ui/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Transcript Viewer provides an interactive display of transcribed text with real-time word highlighting synchronized to audio playback. It uses character-level alignment data from the ElevenLabs API to achieve precise synchronization.
Core Components
The viewer is built with composable React components:
import {
TranscriptViewerContainer,
TranscriptViewerWords,
TranscriptViewerAudio,
TranscriptViewerPlayPauseButton,
TranscriptViewerScrubBar,
} from "@/components/ui/transcript-viewer";
Basic Usage
<TranscriptViewerContainer
audioSrc={audioUrl}
audioType="audio/mpeg"
alignment={alignment}
>
<TranscriptViewerPlayPauseButton />
<TranscriptViewerScrubBar />
<TranscriptViewerWords />
<TranscriptViewerAudio />
</TranscriptViewerContainer>
Character Alignment Data
The viewer requires character-level alignment data from the transcription:
type CharacterAlignmentResponseModel = {
characters: string[];
characterStartTimesSeconds: number[];
characterEndTimesSeconds: number[];
};
This data is generated from the API response:
export function convertToAlignment(
transcript: SpeechToTextChunkResponseModel
): CharacterAlignmentResponseModel {
const characters: string[] = [];
const characterStartTimesSeconds: number[] = [];
const characterEndTimesSeconds: number[] = [];
for (const word of transcript.words) {
if (word.characters && word.characters.length > 0) {
// Use character-level data from API
for (const char of word.characters) {
characters.push(char.text);
characterStartTimesSeconds.push(char.start || 0);
characterEndTimesSeconds.push(char.end || 0);
}
} else {
// Fallback: interpolate from word timing
appendWordCharactersFromText(
word,
characters,
characterStartTimesSeconds,
characterEndTimesSeconds
);
}
}
return {
characters,
characterStartTimesSeconds,
characterEndTimesSeconds,
};
}
Request character-level timestamps by setting timestampsGranularity: "character" in your transcription options for the most accurate synchronization.
Word Tracking
The viewer maintains real-time tracking of the current word during playback:
Segment Types
type TranscriptWord = {
kind: "word";
segmentIndex: number;
wordIndex: number;
text: string;
startTime: number;
endTime: number;
};
type GapSegment = {
kind: "gap";
segmentIndex: number;
text: string;
};
type TranscriptSegment = TranscriptWord | GapSegment;
Current Word Detection
The useTranscriptViewer hook tracks the current word based on playback time:
const handleTimeUpdate = useCallback(
function handleTimeUpdate(currentTime: number) {
if (!words.length) return;
const currentWord =
currentWordIndex >= 0 && currentWordIndex < words.length
? words[currentWordIndex]
: undefined;
if (!currentWord) {
const found = findWordIndex(words, currentTime);
if (found !== -1) setCurrentWordIndex(found);
return;
}
// Move forward if we've passed the current word
if (
currentTime >= currentWord.endTime &&
currentWordIndex + 1 < words.length
) {
const next = getNextWordIndexByStartTime(words, currentTime, currentWordIndex);
setCurrentWordIndex(next);
return;
}
// Move backward if we've seeked backwards
if (currentTime < currentWord.startTime) {
const found = findWordIndex(words, currentTime);
if (found !== -1) setCurrentWordIndex(found);
return;
}
// Re-find if we're out of sync
const found = findWordIndex(words, currentTime);
if (found !== -1 && found !== currentWordIndex) {
setCurrentWordIndex(found);
}
},
[currentWordIndex, words]
);
Word Status & Highlighting
Words are rendered with three different states:
type TranscriptViewerWordStatus = "spoken" | "unspoken" | "current";
Default Styling
function TranscriptViewerWord({ word, status, className, children }: TranscriptViewerWordProps) {
return (
<span
className={cn(
"transition-colors",
status === "spoken" && "text-muted-foreground",
status === "current" && "text-primary font-semibold",
status === "unspoken" && "text-foreground",
className
)}
>
{children ?? word.text}
</span>
);
}
- Spoken words appear muted (already read)
- Current word is highlighted with primary color and bold
- Unspoken words use default text color
Custom Word Rendering
You can customize how words are displayed:
<TranscriptViewerWords
renderWord={({ word, status }) => (
<span className={status === "current" ? "bg-yellow-200" : ""}>
{word.text}
</span>
)}
/>
Segment Composition
The viewer composes segments from alignment data, handling gaps and audio tags:
const { segments, words } = useMemo(() => {
if (segmentComposer) {
return segmentComposer(alignment);
}
return composeSegments(alignment, { hideAudioTags });
}, [segmentComposer, alignment, hideAudioTags]);
By default, audio event tags (like [LAUGHTER]) are hidden:
<TranscriptViewerContainer
alignment={alignment}
hideAudioTags={true} // Default
audioSrc={audioUrl}
audioType="audio/mpeg"
>
Custom Segment Composer
Provide a custom function to control segment composition:
type SegmentComposer = (
alignment: CharacterAlignmentResponseModel
) => {
segments: TranscriptSegment[];
words: TranscriptWord[];
};
Segment State Management
The viewer tracks spoken and unspoken segments:
const spokenSegments = useMemo(
function computeSpokenSegments() {
if (!segments.length || currentSegmentIndex <= 0) return [];
return segments.slice(0, currentSegmentIndex);
},
[segments, currentSegmentIndex]
);
const unspokenSegments = useMemo(
function computeUnspokenSegments() {
if (!segments.length) return [];
if (currentSegmentIndex === -1) return segments;
if (currentSegmentIndex + 1 >= segments.length) return [];
return segments.slice(currentSegmentIndex + 1);
},
[segments, currentSegmentIndex]
);
Rendering Segments
The words component renders all segments with their status:
function TranscriptViewerWords({ renderWord, renderGap, wordClassNames, gapClassNames }: TranscriptViewerWordsProps) {
const { spokenSegments, unspokenSegments, currentWord, segments, duration, currentTime } =
useTranscriptViewerContext();
const nearEnd = useMemo(() => {
if (!duration) return false;
return currentTime >= duration - 0.01;
}, [currentTime, duration]);
const segmentsWithStatus = useMemo(() => {
if (nearEnd) {
return segments.map((segment) => ({
segment,
status: "spoken" as const,
}));
}
const entries: Array<{
segment: TranscriptSegment;
status: TranscriptViewerWordStatus;
}> = [];
for (const segment of spokenSegments) {
entries.push({ segment, status: "spoken" });
}
if (currentWord) {
entries.push({ segment: currentWord, status: "current" });
}
for (const segment of unspokenSegments) {
entries.push({ segment, status: "unspoken" });
}
return entries;
}, [spokenSegments, unspokenSegments, currentWord, nearEnd, segments]);
return (
<div className="text-base leading-relaxed">
{segmentsWithStatus.map(({ segment, status }) => {
if (segment.kind === "gap") {
const content = renderGap ? renderGap({ segment, status }) : segment.text;
return (
<span key={segment.segmentIndex} className={gapClassNames}>
{content}
</span>
);
}
if (renderWord) {
return (
<span key={segment.segmentIndex} className={wordClassNames}>
{renderWord({ word: segment, status })}
</span>
);
}
return (
<TranscriptViewerWord
key={segment.segmentIndex}
word={segment}
status={status}
className={wordClassNames}
/>
);
})}
</div>
);
}
The viewer uses requestAnimationFrame for smooth updates:
const startRaf = useCallback(
function startRaf() {
if (rafRef.current != null) return;
function tick() {
const node = audioRef.current;
if (!node) {
rafRef.current = null;
return;
}
const time = node.currentTime;
setCurrentTime(time);
handleTimeUpdateRef.current(time);
syncDurationFromMetadataIfMissing(node);
rafRef.current = requestAnimationFrame(tick);
}
rafRef.current = requestAnimationFrame(tick);
},
[audioRef, syncDurationFromMetadataIfMissing]
);
The RAF loop is only active during playback. Itβs automatically stopped when audio is paused to conserve resources.
Event Callbacks
The container accepts playback event callbacks:
<TranscriptViewerContainer
alignment={alignment}
audioSrc={audioUrl}
audioType="audio/mpeg"
onPlay={() => console.log("Started playing")}
onPause={() => console.log("Paused")}
onTimeUpdate={(time) => console.log("Time:", time)}
onEnded={() => console.log("Finished")}
onDurationChange={(duration) => console.log("Duration:", duration)}
>
Next Steps