Adding PDFs
There are multiple ways to add PDF documents to your workspace:- Upload
- AI Assistant
PDF files are uploaded to Supabase storage (or local storage in development mode) before OCR processing begins. This avoids Next.js body size limits for large files.
OCR Text Extraction
ThinkEx automatically performs Optical Character Recognition (OCR) on uploaded PDFs using Azure Document Intelligence.OCR Process
What Gets Extracted
The OCR process captures:- Text content: All readable text from the PDF
- Page structure: Headers, footers, and layout information
- Tables: Structured table data
- Hyperlinks: Links within the document
- Images: Embedded images (metadata only)
PDF Viewer Features
The built-in PDF viewer provides a rich reading experience:Navigation
- Page scrolling: Smooth vertical scrolling through pages
- Page numbers: Current page indicator and total page count
- Zoom controls: Adjust zoom level for comfortable reading
- Search: Find text within the PDF
Annotations
Highlight and annotate PDFs directly in the viewer:- Text highlighting: Select and highlight important passages
- Notes: Add comments to specific sections
- Bookmarks: Mark pages for quick reference
Annotation features are provided by the @embedpdf/react library, which offers a modern PDF viewing experience.
Working with PDF Content
AI Integration
The AI can read and work with PDF content thanks to OCR:Page-Specific Queries
Reference specific pages in your questions:Search
PDF text content is fully searchable:PDF Card Properties
PDF cards include these properties:- fileUrl: Storage URL for the PDF file
- filename: Original filename
- fileSize: File size in bytes
- textContent: Cached extracted text (for quick access)
- ocrStatus: Processing status (processing, complete, or failed)
- ocrPages: Detailed page-by-page extraction results
OCR Status
PDFs can be in one of three OCR states:Processing
OCR is currently running
Complete
Text extraction finished successfully
Failed
OCR encountered an error
PDF Uploads and Storage
Upload Process
PDFs are handled with a non-blocking upload flow:- File validation: Check file type and size
- Direct upload: Upload to storage (bypasses Next.js API limits)
- Card creation: Add PDF card to workspace immediately
- Background OCR: Text extraction happens asynchronously
- Content update: Extracted text is added when OCR completes
Storage Locations
- Production: Supabase Storage
- Development: Local filesystem or Supabase
Use Cases
Research
Research
- Import research papers and articles
- Highlight key findings
- Extract quotes and citations
- Generate summaries with AI
Study
Study
- Add textbooks and course materials
- Create flashcards from chapters
- Quiz yourself on PDF content
- Keep notes alongside source PDFs
Work
Work
- Review contracts and documents
- Extract key information with AI
- Annotate important sections
- Keep project documentation accessible
Best Practices
File Organization
- Use descriptive names for PDF cards
- Group related PDFs in folders
- Add notes about key takeaways from each PDF
- Use color coding for different document types
AI Workflow
- Wait for OCR to complete before asking detailed questions
- Reference specific pages for targeted queries
- Use PDFs as source material for notes and flashcards
- Ask the AI to summarize long documents
Performance
- Keep PDF file sizes reasonable (under 50MB recommended)
- Use high-quality scans for better OCR accuracy
- Be patient with large multi-page documents during OCR
PDF text extraction uses Azure Document Intelligence, which provides high-quality OCR with support for complex layouts, tables, and multiple languages.