OCR (Optical Character Recognition)

Overview

Uxie’s OCR feature transforms scanned PDFs and image-based documents into fully searchable, selectable text. This unlocks all of Uxie’s features - highlighting, text-to-speech, AI chat, and more - for documents that would otherwise be inaccessible.

What is OCR?

OCR (Optical Character Recognition) is technology that:

Analyzes images of text
Recognizes characters and words
Converts them to actual digital text
Preserves layout and formatting
Makes documents searchable and editable

Scanned Documents

Convert paper scans into digital text

Image PDFs

Extract text from image-based PDFs

Screenshots

Convert screenshot text to selectable format

Photos

Extract text from photos of documents

When to Use OCR

Use OCR when your PDF:

Was scanned from paper
Has images of text instead of actual text
Cannot be selected or searched
Doesn’t work with text-to-speech
Fails to vectorize for AI chat

Quick test: Try selecting text in your PDF. If you can’t select it, you need OCR.

Using OCR

During Upload

Click Upload

Open the upload modal in your workspace

Select File

Choose your scanned PDF (up to 8MB)

Enable OCR

Check the “OCR Pdf” checkbox

Upload

Click upload and wait for processing

Processing

“Applying OCR, it might take a while…” message appears

Complete

Document opens with fully selectable text

OCR processing time varies by document length and complexity. Expect 30-120 seconds for typical documents.

The upload interface includes:

<Checkbox id="ocr" checked={doOcr} onCheckedChange={setDoOcr} />
<label htmlFor="ocr">OCR Pdf</label>

Found at /src/components/workspace/upload-file-modal.tsx:190.

OCR Technology

Scribe.js Engine

Uxie uses Scribe.js-OCR, a powerful browser-based OCR library:

Runs entirely in your browser
No server processing (privacy-friendly)
Supports multiple languages
Preserves PDF layout
High accuracy on printed text

Processing Modes

Scribe.js offers multiple modes:

Quality Mode

Highest accuracy (default)

Best for final documents
Slower processing
Optimal for important documents
Used by Uxie

Speed Mode

Faster processing

Lower accuracy
Good for quick previews
Not currently available in Uxie

Combined Mode

Hybrid approach

Combines native PDF text extraction with OCR
Extracts existing text where available
Applies OCR only where needed
Most efficient for mixed documents
Used by Uxie

Configuration

await scribe.recognize({
  mode: "quality",        // Accuracy level
  langs: ["eng"],         // Languages to recognize
  modeAdv: "combined",    // Use both native & OCR
  vanillaMode: true,      // Standard recognition
  combineMode: "data",    // How to combine results
});

From /src/components/workspace/upload-file-modal.tsx:85.

Language Support

Currently supported languages:

English (eng) - Default

Additional language support is planned for future releases. Scribe.js supports many languages including Spanish, French, German, Chinese, and more.

Processing Steps

Behind the Scenes

File Import

PDF file is loaded into Scribe.js

Page Analysis

Each page is analyzed for text regions

Text Recognition

OCR engine recognizes characters in each region

Layout Preservation

Original formatting and positioning maintained

PDF Generation

New PDF created with invisible text layer over images

Upload

Processed PDF uploaded to Uxie

What Gets Preserved

✓ Original images - Visual appearance unchanged ✓ Page layout - Structure maintained
✓ Text positions - Words appear in correct locations ✓ Font sizes - Relative sizing preserved ✗ Original fonts - Text becomes invisible, images show fonts ✗ Exact formatting - Minor spacing differences possible

Quality Factors

Input Document Quality

OCR accuracy depends on:

Image Resolution

Higher DPI = better recognition (300+ DPI recommended)

Text Clarity

Clear, crisp text works best; blurry text may have errors

Contrast

High contrast (dark text on light background) improves accuracy

Font Type

Standard fonts work better than handwriting or decorative fonts

Challenging Cases

OCR may struggle with:

Handwritten text
Very small fonts (< 8pt)
Low-resolution scans (< 200 DPI)
Colored or patterned backgrounds
Unusual fonts or typography
Complex mathematical notation
Tables with fine lines
Multi-column layouts

After OCR Processing

Enabled Features

Once OCR is complete, you can: ✓ Select and copy text - Highlight any text in the document ✓ Search - Find specific words or phrases ✓ Text-to-Speech - Listen to the document read aloud ✓ Highlight - Create annotations and notes ✓ AI Chat - Ask questions about the content ✓ Flashcards - Generate study cards from content

Quality Check

After OCR, verify quality:

Try selecting text in various areas
Check for garbled or missing characters
Test text-to-speech on a paragraph
Search for a known word or phrase

If quality is poor, try:

Re-scanning the original at higher DPI
Improving image contrast
Using a cleaner source document

Technical Implementation

Import and Initialization

import scribe from "scribe.js-ocr";

// Configure display mode
scribe.opt.displayMode = "invis";

// Import files
await scribe.importFiles(files);

// Perform recognition  
await scribe.recognize({ /* config */ });

// Export processed PDF
const data = await scribe.exportData("pdf");

From /src/components/workspace/upload-file-modal.tsx:27.

File Handling

onBeforeUploadBegin: async (files) => {
  if (doOcr) {
    setIsOcring(true);
    
    await scribe.importFiles(files);
    await scribe.recognize({ /* ... */ });
    
    const data = await scribe.exportData("pdf");
    const blob = new Blob([data], { type: "application/pdf" });
    const file = new File([blob], originalName, {
      type: "application/pdf"
    });
    
    setIsOcring(false);
    return [file];
  }
  return files;
}

The OCR process happens before upload, ensuring the processed file is what gets stored.

Display Mode

scribe.opt.displayMode = "invis";

Sets the OCR’d text layer to invisible, preserving the original visual appearance while enabling text selection.

Performance

Processing Time

Typical OCR times:

1-5 pages: 10-30 seconds
6-20 pages: 30-90 seconds
21-50 pages: 90-180 seconds
50+ pages: 3-5 minutes+

Processing happens in your browser, so times vary based on your computer’s CPU and RAM.

Browser Requirements

OCR requires:

Modern browser (Chrome, Edge, Firefox, Safari)
Sufficient RAM (4GB+ recommended)
JavaScript enabled
IndexedDB support

Memory Usage

OCR is memory-intensive. Close other tabs and applications when processing large documents to avoid browser slowdowns or crashes.

Best Practices

Scan at 300 DPI: This is the sweet spot for quality vs. file size.

Black and white scans: Color isn’t needed for text and increases file size.

Clean source documents: Remove smudges, folds, and marks before scanning.

Check before uploading: Verify your scan is clear and straight.

Small batches: OCR a few pages first to verify quality before doing an entire book.

Troubleshooting

OCR is taking forever

Large documents take time (this is normal)
Check browser memory usage
Close other tabs and applications
Try a smaller document first
Refresh page and try again if stuck

OCR failed / error message

File may be corrupted
File size may exceed limits
Browser may be out of memory
Try refreshing and re-uploading
Try a different browser

Text is garbled or incorrect

Source image quality may be poor
Try re-scanning at higher resolution
Ensure adequate contrast
Check if language is supported

Some text is missing

Text may be in unsupported language
Font may be too small or decorative
Image may have low contrast in that area
Try manually typing missing sections

Can't select text after OCR

OCR may have failed silently
Try the process again
Check browser console for errors
Verify the “OCR Pdf” checkbox was selected

Limitations

Current limitations:

Maximum file size: 8MB
Single language per document (English only currently)
No handwriting recognition
Processing happens locally (no cloud acceleration)
No batch processing UI
Cannot OCR existing uploaded documents (must re-upload)

Privacy & Security

Where is OCR performed?

Entirely in your browser. No document data is sent to external servers for OCR processing. Scribe.js runs locally on your computer.

Is my document stored during OCR?

Only the final processed PDF is uploaded to Uxie’s servers. The OCR processing happens in temporary browser memory and is discarded after completion.

Can others see my OCR'd documents?

Only if you explicitly share them. OCR’d documents have the same privacy settings as any other Uxie document.

Future Enhancements

Planned features:

Multi-language support
OCR existing uploaded documents
Batch OCR processing
Manual text correction UI
OCR quality preview before upload
Cloud OCR option for faster processing
Handwriting recognition

PDF Reading

Read OCR’d documents

Text-to-Speech

Listen to OCR’d text

AI Chat

Ask questions about OCR’d content

Get Started

Core Features

Advanced Features

Configuration

​Overview

​What is OCR?

Scanned Documents

Image PDFs

Screenshots

Photos

​When to Use OCR

​Using OCR

​During Upload

​Upload Modal Options

​OCR Technology

​Scribe.js Engine

​Processing Modes

​Configuration

​Language Support

​Processing Steps

​Behind the Scenes

​What Gets Preserved

​Quality Factors

​Input Document Quality

Image Resolution

Text Clarity

Contrast

Font Type

​Challenging Cases

​After OCR Processing

​Enabled Features

​Quality Check

​Technical Implementation

​Import and Initialization

​File Handling

​Display Mode

​Performance

​Processing Time

​Browser Requirements

​Memory Usage

​Best Practices

​Troubleshooting

​Limitations

​Privacy & Security

​Future Enhancements

​Related Features

PDF Reading

Text-to-Speech

AI Chat

Build docs developers (and LLMs) love

Overview

What is OCR?

When to Use OCR

Using OCR

During Upload

Upload Modal Options

OCR Technology

Scribe.js Engine

Processing Modes

Configuration

Language Support

Processing Steps

Behind the Scenes

What Gets Preserved

Quality Factors

Input Document Quality

Challenging Cases

After OCR Processing

Enabled Features

Quality Check

Technical Implementation

Import and Initialization

File Handling

Display Mode

Performance

Processing Time

Browser Requirements

Memory Usage

Best Practices

Troubleshooting

Limitations

Privacy & Security

Future Enhancements

Related Features