Transcript Viewer

Overview

The Transcript Viewer provides an interactive display of transcribed text with real-time word highlighting synchronized to audio playback. It uses character-level alignment data from the ElevenLabs API to achieve precise synchronization.

Core Components

The viewer is built with composable React components:

import {
  TranscriptViewerContainer,
  TranscriptViewerWords,
  TranscriptViewerAudio,
  TranscriptViewerPlayPauseButton,
  TranscriptViewerScrubBar,
} from "@/components/ui/transcript-viewer";

Basic Usage

<TranscriptViewerContainer
  audioSrc={audioUrl}
  audioType="audio/mpeg"
  alignment={alignment}
>
  <TranscriptViewerPlayPauseButton />
  <TranscriptViewerScrubBar />
  <TranscriptViewerWords />
  <TranscriptViewerAudio />
</TranscriptViewerContainer>

Character Alignment Data

The viewer requires character-level alignment data from the transcription:

type CharacterAlignmentResponseModel = {
  characters: string[];
  characterStartTimesSeconds: number[];
  characterEndTimesSeconds: number[];
};

This data is generated from the API response:

export function convertToAlignment(
  transcript: SpeechToTextChunkResponseModel
): CharacterAlignmentResponseModel {
  const characters: string[] = [];
  const characterStartTimesSeconds: number[] = [];
  const characterEndTimesSeconds: number[] = [];

  for (const word of transcript.words) {
    if (word.characters && word.characters.length > 0) {
      // Use character-level data from API
      for (const char of word.characters) {
        characters.push(char.text);
        characterStartTimesSeconds.push(char.start || 0);
        characterEndTimesSeconds.push(char.end || 0);
      }
    } else {
      // Fallback: interpolate from word timing
      appendWordCharactersFromText(
        word,
        characters,
        characterStartTimesSeconds,
        characterEndTimesSeconds
      );
    }
  }

  return {
    characters,
    characterStartTimesSeconds,
    characterEndTimesSeconds,
  };
}

Request character-level timestamps by setting timestampsGranularity: "character" in your transcription options for the most accurate synchronization.

Word Tracking

The viewer maintains real-time tracking of the current word during playback:

Segment Types

type TranscriptWord = {
  kind: "word";
  segmentIndex: number;
  wordIndex: number;
  text: string;
  startTime: number;
  endTime: number;
};

type GapSegment = {
  kind: "gap";
  segmentIndex: number;
  text: string;
};

type TranscriptSegment = TranscriptWord | GapSegment;

Current Word Detection

The useTranscriptViewer hook tracks the current word based on playback time:

const handleTimeUpdate = useCallback(
  function handleTimeUpdate(currentTime: number) {
    if (!words.length) return;

    const currentWord =
      currentWordIndex >= 0 && currentWordIndex < words.length
        ? words[currentWordIndex]
        : undefined;

    if (!currentWord) {
      const found = findWordIndex(words, currentTime);
      if (found !== -1) setCurrentWordIndex(found);
      return;
    }

    // Move forward if we've passed the current word
    if (
      currentTime >= currentWord.endTime &&
      currentWordIndex + 1 < words.length
    ) {
      const next = getNextWordIndexByStartTime(words, currentTime, currentWordIndex);
      setCurrentWordIndex(next);
      return;
    }

    // Move backward if we've seeked backwards
    if (currentTime < currentWord.startTime) {
      const found = findWordIndex(words, currentTime);
      if (found !== -1) setCurrentWordIndex(found);
      return;
    }

    // Re-find if we're out of sync
    const found = findWordIndex(words, currentTime);
    if (found !== -1 && found !== currentWordIndex) {
      setCurrentWordIndex(found);
    }
  },
  [currentWordIndex, words]
);

Word Status & Highlighting

Words are rendered with three different states:

type TranscriptViewerWordStatus = "spoken" | "unspoken" | "current";

Default Styling

function TranscriptViewerWord({ word, status, className, children }: TranscriptViewerWordProps) {
  return (
    <span
      className={cn(
        "transition-colors",
        status === "spoken" && "text-muted-foreground",
        status === "current" && "text-primary font-semibold",
        status === "unspoken" && "text-foreground",
        className
      )}
    >
      {children ?? word.text}
    </span>
  );
}

Spoken words appear muted (already read)
Current word is highlighted with primary color and bold
Unspoken words use default text color

Custom Word Rendering

You can customize how words are displayed:

<TranscriptViewerWords
  renderWord={({ word, status }) => (
    <span className={status === "current" ? "bg-yellow-200" : ""}>
      {word.text}
    </span>
  )}
/>

Segment Composition

The viewer composes segments from alignment data, handling gaps and audio tags:

const { segments, words } = useMemo(() => {
  if (segmentComposer) {
    return segmentComposer(alignment);
  }
  return composeSegments(alignment, { hideAudioTags });
}, [segmentComposer, alignment, hideAudioTags]);

Hiding Audio Tags

By default, audio event tags (like [LAUGHTER]) are hidden:

<TranscriptViewerContainer
  alignment={alignment}
  hideAudioTags={true}  // Default
  audioSrc={audioUrl}
  audioType="audio/mpeg"
>

Custom Segment Composer

Provide a custom function to control segment composition:

type SegmentComposer = (
  alignment: CharacterAlignmentResponseModel
) => {
  segments: TranscriptSegment[];
  words: TranscriptWord[];
};

Segment State Management

The viewer tracks spoken and unspoken segments:

const spokenSegments = useMemo(
  function computeSpokenSegments() {
    if (!segments.length || currentSegmentIndex <= 0) return [];
    return segments.slice(0, currentSegmentIndex);
  },
  [segments, currentSegmentIndex]
);

const unspokenSegments = useMemo(
  function computeUnspokenSegments() {
    if (!segments.length) return [];
    if (currentSegmentIndex === -1) return segments;
    if (currentSegmentIndex + 1 >= segments.length) return [];
    return segments.slice(currentSegmentIndex + 1);
  },
  [segments, currentSegmentIndex]
);

Rendering Segments

The words component renders all segments with their status:

function TranscriptViewerWords({ renderWord, renderGap, wordClassNames, gapClassNames }: TranscriptViewerWordsProps) {
  const { spokenSegments, unspokenSegments, currentWord, segments, duration, currentTime } =
    useTranscriptViewerContext();

  const nearEnd = useMemo(() => {
    if (!duration) return false;
    return currentTime >= duration - 0.01;
  }, [currentTime, duration]);

  const segmentsWithStatus = useMemo(() => {
    if (nearEnd) {
      return segments.map((segment) => ({
        segment,
        status: "spoken" as const,
      }));
    }

    const entries: Array<{
      segment: TranscriptSegment;
      status: TranscriptViewerWordStatus;
    }> = [];

    for (const segment of spokenSegments) {
      entries.push({ segment, status: "spoken" });
    }

    if (currentWord) {
      entries.push({ segment: currentWord, status: "current" });
    }

    for (const segment of unspokenSegments) {
      entries.push({ segment, status: "unspoken" });
    }

    return entries;
  }, [spokenSegments, unspokenSegments, currentWord, nearEnd, segments]);

  return (
    <div className="text-base leading-relaxed">
      {segmentsWithStatus.map(({ segment, status }) => {
        if (segment.kind === "gap") {
          const content = renderGap ? renderGap({ segment, status }) : segment.text;
          return (
            <span key={segment.segmentIndex} className={gapClassNames}>
              {content}
            </span>
          );
        }

        if (renderWord) {
          return (
            <span key={segment.segmentIndex} className={wordClassNames}>
              {renderWord({ word: segment, status })}
            </span>
          );
        }

        return (
          <TranscriptViewerWord
            key={segment.segmentIndex}
            word={segment}
            status={status}
            className={wordClassNames}
          />
        );
      })}
    </div>
  );
}

Performance Optimization

The viewer uses requestAnimationFrame for smooth updates:

const startRaf = useCallback(
  function startRaf() {
    if (rafRef.current != null) return;
    function tick() {
      const node = audioRef.current;
      if (!node) {
        rafRef.current = null;
        return;
      }
      const time = node.currentTime;
      setCurrentTime(time);
      handleTimeUpdateRef.current(time);
      syncDurationFromMetadataIfMissing(node);
      rafRef.current = requestAnimationFrame(tick);
    }
    rafRef.current = requestAnimationFrame(tick);
  },
  [audioRef, syncDurationFromMetadataIfMissing]
);

The RAF loop is only active during playback. It’s automatically stopped when audio is paused to conserve resources.

Event Callbacks

The container accepts playback event callbacks:

<TranscriptViewerContainer
  alignment={alignment}
  audioSrc={audioUrl}
  audioType="audio/mpeg"
  onPlay={() => console.log("Started playing")}
  onPause={() => console.log("Paused")}
  onTimeUpdate={(time) => console.log("Time:", time)}
  onEnded={() => console.log("Finished")}
  onDurationChange={(duration) => console.log("Duration:", duration)}
>

Next Steps

Implement Audio Playback controls for scrubbing and seeking
Add Speaker Diarization to highlight different speakers
Configure Transcription options for optimal results

Get Started

Core Features

Configuration

Deployment

Overview

Core Components

Basic Usage

Character Alignment Data

Word Tracking

Segment Types

Current Word Detection

Word Status & Highlighting

Default Styling

Custom Word Rendering

Segment Composition

Hiding Audio Tags

Custom Segment Composer

Segment State Management

Rendering Segments

Performance Optimization

Event Callbacks

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

Documentation Index

​Overview

​Core Components

​Basic Usage

​Character Alignment Data

​Word Tracking

​Segment Types

​Current Word Detection

​Word Status & Highlighting

​Default Styling

​Custom Word Rendering

​Segment Composition

​Hiding Audio Tags

​Custom Segment Composer

​Segment State Management

​Rendering Segments

​Performance Optimization

​Event Callbacks

​Next Steps

Build docs developers (and LLMs) love

Overview

Core Components

Basic Usage

Character Alignment Data

Word Tracking

Segment Types

Current Word Detection

Word Status & Highlighting

Default Styling

Custom Word Rendering

Segment Composition

Hiding Audio Tags

Custom Segment Composer

Segment State Management

Rendering Segments

Performance Optimization

Event Callbacks

Next Steps