Overview
Uxie’s OCR feature transforms scanned PDFs and image-based documents into fully searchable, selectable text. This unlocks all of Uxie’s features - highlighting, text-to-speech, AI chat, and more - for documents that would otherwise be inaccessible.What is OCR?
OCR (Optical Character Recognition) is technology that:- Analyzes images of text
- Recognizes characters and words
- Converts them to actual digital text
- Preserves layout and formatting
- Makes documents searchable and editable
Scanned Documents
Convert paper scans into digital text
Image PDFs
Extract text from image-based PDFs
Screenshots
Convert screenshot text to selectable format
Photos
Extract text from photos of documents
When to Use OCR
Use OCR when your PDF:- Was scanned from paper
- Has images of text instead of actual text
- Cannot be selected or searched
- Doesn’t work with text-to-speech
- Fails to vectorize for AI chat
Using OCR
During Upload
OCR processing time varies by document length and complexity. Expect 30-120 seconds for typical documents.
Upload Modal Options
The upload interface includes:/src/components/workspace/upload-file-modal.tsx:190.
OCR Technology
Scribe.js Engine
Uxie uses Scribe.js-OCR, a powerful browser-based OCR library:- Runs entirely in your browser
- No server processing (privacy-friendly)
- Supports multiple languages
- Preserves PDF layout
- High accuracy on printed text
Processing Modes
Scribe.js offers multiple modes:Quality Mode
Quality Mode
Highest accuracy (default)
- Best for final documents
- Slower processing
- Optimal for important documents
- Used by Uxie
Speed Mode
Speed Mode
Faster processing
- Lower accuracy
- Good for quick previews
- Not currently available in Uxie
Combined Mode
Combined Mode
Hybrid approach
- Combines native PDF text extraction with OCR
- Extracts existing text where available
- Applies OCR only where needed
- Most efficient for mixed documents
- Used by Uxie
Configuration
/src/components/workspace/upload-file-modal.tsx:85.
Language Support
Currently supported languages:- English (eng) - Default
Additional language support is planned for future releases. Scribe.js supports many languages including Spanish, French, German, Chinese, and more.
Processing Steps
Behind the Scenes
What Gets Preserved
✓ Original images - Visual appearance unchanged ✓ Page layout - Structure maintained✓ Text positions - Words appear in correct locations ✓ Font sizes - Relative sizing preserved ✗ Original fonts - Text becomes invisible, images show fonts ✗ Exact formatting - Minor spacing differences possible
Quality Factors
Input Document Quality
OCR accuracy depends on:Image Resolution
Higher DPI = better recognition (300+ DPI recommended)
Text Clarity
Clear, crisp text works best; blurry text may have errors
Contrast
High contrast (dark text on light background) improves accuracy
Font Type
Standard fonts work better than handwriting or decorative fonts
Challenging Cases
After OCR Processing
Enabled Features
Once OCR is complete, you can: ✓ Select and copy text - Highlight any text in the document ✓ Search - Find specific words or phrases ✓ Text-to-Speech - Listen to the document read aloud ✓ Highlight - Create annotations and notes ✓ AI Chat - Ask questions about the content ✓ Flashcards - Generate study cards from contentQuality Check
After OCR, verify quality:- Try selecting text in various areas
- Check for garbled or missing characters
- Test text-to-speech on a paragraph
- Search for a known word or phrase
- Re-scanning the original at higher DPI
- Improving image contrast
- Using a cleaner source document
Technical Implementation
Import and Initialization
/src/components/workspace/upload-file-modal.tsx:27.
File Handling
Display Mode
Performance
Processing Time
Typical OCR times:- 1-5 pages: 10-30 seconds
- 6-20 pages: 30-90 seconds
- 21-50 pages: 90-180 seconds
- 50+ pages: 3-5 minutes+
Processing happens in your browser, so times vary based on your computer’s CPU and RAM.
Browser Requirements
OCR requires:- Modern browser (Chrome, Edge, Firefox, Safari)
- Sufficient RAM (4GB+ recommended)
- JavaScript enabled
- IndexedDB support
Memory Usage
Best Practices
Troubleshooting
OCR is taking forever
OCR is taking forever
- Large documents take time (this is normal)
- Check browser memory usage
- Close other tabs and applications
- Try a smaller document first
- Refresh page and try again if stuck
OCR failed / error message
OCR failed / error message
- File may be corrupted
- File size may exceed limits
- Browser may be out of memory
- Try refreshing and re-uploading
- Try a different browser
Text is garbled or incorrect
Text is garbled or incorrect
- Source image quality may be poor
- Try re-scanning at higher resolution
- Ensure adequate contrast
- Check if language is supported
Some text is missing
Some text is missing
- Text may be in unsupported language
- Font may be too small or decorative
- Image may have low contrast in that area
- Try manually typing missing sections
Can't select text after OCR
Can't select text after OCR
- OCR may have failed silently
- Try the process again
- Check browser console for errors
- Verify the “OCR Pdf” checkbox was selected
Limitations
Privacy & Security
Where is OCR performed?
Where is OCR performed?
Entirely in your browser. No document data is sent to external servers for OCR processing. Scribe.js runs locally on your computer.
Is my document stored during OCR?
Is my document stored during OCR?
Only the final processed PDF is uploaded to Uxie’s servers. The OCR processing happens in temporary browser memory and is discarded after completion.
Can others see my OCR'd documents?
Can others see my OCR'd documents?
Only if you explicitly share them. OCR’d documents have the same privacy settings as any other Uxie document.
Future Enhancements
Planned features:
- Multi-language support
- OCR existing uploaded documents
- Batch OCR processing
- Manual text correction UI
- OCR quality preview before upload
- Cloud OCR option for faster processing
- Handwriting recognition
Related Features
PDF Reading
Read OCR’d documents
Text-to-Speech
Listen to OCR’d text
AI Chat
Ask questions about OCR’d content
