Text-to-Speech

Overview

Uxie’s text-to-speech feature turns your PDFs into audio, allowing you to listen while following along with synchronized highlighting. Choose from multiple voice engines and adjust playback speed to match your preferences.

Key Features

Word Highlighting

Follow along with word-by-word highlighting as the document is read

Multiple Voices

Choose from browser voices, Kokoro AI, or Supertonic AI voices

Speed Control

Adjust reading speed from 0.5x to 2x

Follow Along Mode

Auto-scroll the page to keep the current word in view

Getting Started

Starting Text-to-Speech

Open TTS Controls

Click the audio icon in the bottom toolbar

Press Play

Click the Play button to start reading from the current page

Follow Along

Watch as words are highlighted in real-time

Navigate

Use controls to pause, skip, or adjust speed

TTS requires the browser’s SpeechSynthesis API. All modern browsers support this feature.

TTS Controls

Playback Buttons

Play / Pause

Play (▶): Start or resume reading
Pause (⏸): Temporarily stop without losing position
State persists - resume exactly where you paused

Skip Forward

Skip (⏭): Jump to the next sentence
Useful for navigating quickly through familiar content
Maintains reading flow and highlighting

Stop

Stop (🚫): End reading session
Resets to beginning of current page
Clears all highlighting

Reading Speed

Click the speed button to cycle through speeds:

0.5x - Slow, careful listening
0.75x - Relaxed pace
1x - Normal reading speed (default)
1.25x - Slightly faster
1.5x - Fast reading
1.75x - Very fast
2x - Maximum speed

Start at 1x and gradually increase speed as you get comfortable. 1.25x to 1.5x is optimal for most users.

Follow Along Mode

Toggle the Eye icon to enable/disable:

Enabled (highlighted): Page auto-scrolls to keep the current word visible
Disabled: Page stays fixed - you manually scroll

Perfect for:

Listening while doing other tasks
Following along in bed or on the couch
Studying without constant scrolling

Implemented at /src/components/pdf-reader/toolbar/tts-controls.tsx.

Voice Options

Voice Engines

Uxie supports three TTS engines:

Browser Voices

Native system voices

Built into your operating system
No additional setup required
Fast and reliable
Quality varies by OS
Free

Available voices depend on your system:

Windows: Microsoft David, Microsoft Zira, etc.
macOS: Alex, Samantha, Victoria, etc.
Linux: eSpeak voices

Kokoro AI

AI-powered natural voices

High-quality neural TTS
More natural-sounding than browser voices
Requires WebGPU or WASM support
Multiple voice personas available
May require initial model download

Supported voices defined at /src/lib/tts/providers/kokoro-provider.ts

Supertonic AI

Premium AI voices

Studio-quality voice synthesis
Extremely natural prosody
Best voice quality available
May require API access

Supported voices defined at /src/lib/tts/providers/supertonic-provider.ts

Choosing a Voice

Voice selection UI is currently in development. The default voice is determined by your browser and system settings.

Reading Modes

Continuous Reading

Reading from start to finish:

Navigate to your starting page
Click Play
TTS reads the entire page, then advances
Continues until you stop or reach the end

Selected Text Reading

Read just a specific passage:

Select Text

Highlight the text you want to hear

Click Read Icon

Click the audio icon in the selection popover

Listen

TTS reads only the selected text

Implemented at /src/components/pdf-reader/highlight-popover.tsx:72.

Resume Reading

Continue from last position feature is in development. Currently, TTS restarts from the beginning of the current page when you reload.

Word Highlighting

How It Works

TTS extracts text from the PDF in blocks
Text is split into sentences
As each word is spoken, it’s highlighted on the page
Highlighting follows the audio in real-time

Highlight Appearance

Active word: Highlighted in bright color
Smooth transitions: Highlighting moves fluidly between words
Sentence-aware: Pauses briefly at sentence boundaries

Reading Modes

Two highlighting modes are available:

TEXT Mode

Standard word-by-word highlighting. Used when reading selected text or specific passages.

SENTENCE Mode

Sentence-by-sentence highlighting. Used for continuous document reading.

Mode constants defined at /src/components/pdf-reader/constants.ts:1.

Technical Details

Implementation

The TTS system consists of: Base Provider (/src/lib/tts/base-audio-provider.ts):

Abstract class for all TTS engines
Handles audio playback
Manages state (playing, paused, stopped)

Browser Provider (/src/lib/tts/providers/browser-provider.ts):

Uses Web Speech API
SpeechSynthesis interface
System voice access

Kokoro Provider (/src/lib/tts/providers/kokoro-provider.ts):

Neural TTS model
WebGPU acceleration
WASM fallback

Supertonic Provider (/src/lib/tts/providers/supertonic-provider.ts):

Cloud-based AI TTS
Premium voice quality

Engine Detection

export function getEngineFromVoice(voiceId: TTSVoiceId): TTSEngineType {
  if (BROWSER_VOICES.some((v) => v.id === voiceId)) {
    return "browser";
  }
  if (KOKORO_VOICES.some((v) => v.id === voiceId)) {
    return "kokoro";
  }
  if (SUPERTONIC_VOICES.some((v) => v.id === voiceId)) {
    return "supertonic";
  }
  return "browser"; // default
}

From /src/lib/tts/index.ts:14.

WebGPU Detection

For Kokoro AI voices:

export async function detectWebGPU(): Promise<boolean> {
  if (typeof navigator === "undefined" || !("gpu" in navigator)) {
    return false;
  }
  try {
    const adapter = await navigator.gpu.requestAdapter();
    return adapter !== null;
  } catch {
    return false;
  }
}

WebGPU provides hardware acceleration for AI voice synthesis.

Reading Status States

enum READING_STATUS {
  IDLE,     // Not reading
  READING,  // Currently reading
  PAUSED    // Temporarily paused
}

State management ensures proper control flow and UI updates.

Best Practices

Use headphones: Better audio quality and less distraction for those around you.

Enable Follow Along: Let the page auto-scroll so you can focus on listening and understanding.

Adjust speed gradually: Start at 1x, then increase by 0.25x increments until you find your optimal speed.

Combine with highlighting: Listen while the AI reads, then highlight important passages afterward.

Use for editing: Listen to your own notes to catch awkward phrasing or errors.

Accessibility

TTS makes Uxie more accessible:

Visual impairments: Listen to documents without reading
Dyslexia: Hear correct pronunciation and pacing
Learning disabilities: Multi-sensory learning (audio + visual)
ESL learners: Improve pronunciation and listening skills
Multitasking: Absorb content while doing other activities

Limitations

TTS only works with text-based PDFs
Scanned PDFs must be OCR’d first
Some special characters may be mispronounced
Mathematical formulas are read as text, not equations
Tables may not read in logical order

Browser Compatibility

Browser	Browser Voices	Kokoro AI	Supertonic AI
Chrome	✓	✓ (WebGPU)	✓
Edge	✓	✓ (WebGPU)	✓
Firefox	✓	✓ (WASM)	✓
Safari	✓	✗	✓

Safari does not support WebGPU yet, limiting Kokoro AI voice availability.

Troubleshooting

No sound / TTS not working

Check system volume and browser permissions
Ensure speaker/headphones are connected
Try refreshing the page
Check browser console for errors
Verify SpeechSynthesis API support: open DevTools and run window.speechSynthesis

Words not highlighting

PDF may be image-based (use OCR first)
Some PDF formats don’t support text extraction
Try a different PDF viewer or re-export the PDF
Check if text is selectable in the PDF

Voice is robotic / poor quality

Browser voices vary by OS and can sound robotic
Try Kokoro or Supertonic voices for better quality
Update your operating system for newer voice engines
On Windows, install additional language packs

Reading skips words or sentences

PDF text extraction may have issues
Complex layouts (multi-column, tables) can confuse extraction
Try adjusting reading speed (slower can help)
Report the issue if it persists across documents

Follow Along not working

Ensure the button is highlighted (active)
Try toggling it off and on again
Check if page scrolling is locked by another extension
Refresh the page and try again

Future Enhancements

Planned features:

Voice selection UI
Bookmark positions to resume later
Download audio files
Customize highlight colors
Reading statistics (time listened, pages read)
Playlist mode (queue multiple documents)

PDF Reading

Navigate and view your documents

Annotations

Highlight while listening

OCR

Make scanned PDFs readable

Get Started

Core Features

Advanced Features

Configuration

​Overview

​Key Features

Word Highlighting

Multiple Voices

Speed Control

Follow Along Mode

​Getting Started

​Starting Text-to-Speech

​TTS Controls

​Playback Buttons

​Reading Speed

​Follow Along Mode

​Voice Options

​Voice Engines

​Choosing a Voice

​Reading Modes

​Continuous Reading

​Selected Text Reading

​Resume Reading

​Word Highlighting

​How It Works

​Highlight Appearance

​Reading Modes

​Technical Details

​Implementation

​Engine Detection

​WebGPU Detection

​Reading Status States

​Best Practices

​Accessibility

​Limitations

​Browser Compatibility

​Troubleshooting

​Future Enhancements

​Related Features

PDF Reading

Annotations

OCR

Build docs developers (and LLMs) love

Overview

Key Features

Getting Started

Starting Text-to-Speech

TTS Controls

Playback Buttons

Reading Speed

Follow Along Mode

Voice Options

Voice Engines

Choosing a Voice

Reading Modes

Continuous Reading

Selected Text Reading

Resume Reading

Word Highlighting

How It Works

Highlight Appearance

Reading Modes

Technical Details

Implementation

Engine Detection

WebGPU Detection

Reading Status States

Best Practices

Accessibility

Limitations

Browser Compatibility

Troubleshooting

Future Enhancements

Related Features