Skip to main content
ThinkEx includes a powerful PDF viewer that lets you work with PDF documents directly in your workspace. PDFs are treated as first-class cards with OCR text extraction, making the content searchable and available to the AI.

Adding PDFs

There are multiple ways to add PDF documents to your workspace:
1

Click the + button

Open the add card menu in your workspace
2

Select PDF

Choose a PDF file from your computer
3

Wait for upload

The file uploads to storage and OCR processing begins automatically
PDF files are uploaded to Supabase storage (or local storage in development mode) before OCR processing begins. This avoids Next.js body size limits for large files.

OCR Text Extraction

ThinkEx automatically performs Optical Character Recognition (OCR) on uploaded PDFs using Azure Document Intelligence.

OCR Process

1

Upload

PDF file is uploaded to storage
2

OCR Processing

Azure Document Intelligence analyzes the document
3

Text Extraction

Text content is extracted and converted to markdown
4

Indexing

Extracted text is stored for search and AI access

What Gets Extracted

The OCR process captures:
  • Text content: All readable text from the PDF
  • Page structure: Headers, footers, and layout information
  • Tables: Structured table data
  • Hyperlinks: Links within the document
  • Images: Embedded images (metadata only)
OCR works best with clear, well-formatted PDFs. Scanned documents with low resolution or poor contrast may have reduced accuracy.

PDF Viewer Features

The built-in PDF viewer provides a rich reading experience:
  • Page scrolling: Smooth vertical scrolling through pages
  • Page numbers: Current page indicator and total page count
  • Zoom controls: Adjust zoom level for comfortable reading
  • Search: Find text within the PDF

Annotations

Highlight and annotate PDFs directly in the viewer:
  • Text highlighting: Select and highlight important passages
  • Notes: Add comments to specific sections
  • Bookmarks: Mark pages for quick reference
Annotation features are provided by the @embedpdf/react library, which offers a modern PDF viewing experience.

Working with PDF Content

AI Integration

The AI can read and work with PDF content thanks to OCR:
Summarize the main points from the "Research Paper" PDF
Create flashcards from pages 5-10 of the textbook PDF
What does the contract say about payment terms?

Page-Specific Queries

Reference specific pages in your questions:
read Research Paper pdf pages 1-3
The AI will focus on the specified page range. PDF text content is fully searchable:
search for "quantum computing" in workspace
The AI will search through all PDFs and return relevant matches with page numbers.

PDF Card Properties

PDF cards include these properties:
  • fileUrl: Storage URL for the PDF file
  • filename: Original filename
  • fileSize: File size in bytes
  • textContent: Cached extracted text (for quick access)
  • ocrStatus: Processing status (processing, complete, or failed)
  • ocrPages: Detailed page-by-page extraction results

OCR Status

PDFs can be in one of three OCR states:

Processing

OCR is currently running

Complete

Text extraction finished successfully

Failed

OCR encountered an error
If OCR fails, the PDF card will still display the document, but text search and AI access will be limited.

PDF Uploads and Storage

Upload Process

PDFs are handled with a non-blocking upload flow:
  1. File validation: Check file type and size
  2. Direct upload: Upload to storage (bypasses Next.js API limits)
  3. Card creation: Add PDF card to workspace immediately
  4. Background OCR: Text extraction happens asynchronously
  5. Content update: Extracted text is added when OCR completes

Storage Locations

  • Production: Supabase Storage
  • Development: Local filesystem or Supabase
Large PDFs are supported because the upload goes directly to storage rather than through the Next.js API route, which has a 10MB body limit.

Use Cases

  • Import research papers and articles
  • Highlight key findings
  • Extract quotes and citations
  • Generate summaries with AI
  • Add textbooks and course materials
  • Create flashcards from chapters
  • Quiz yourself on PDF content
  • Keep notes alongside source PDFs
  • Review contracts and documents
  • Extract key information with AI
  • Annotate important sections
  • Keep project documentation accessible

Best Practices

File Organization

  • Use descriptive names for PDF cards
  • Group related PDFs in folders
  • Add notes about key takeaways from each PDF
  • Use color coding for different document types

AI Workflow

  • Wait for OCR to complete before asking detailed questions
  • Reference specific pages for targeted queries
  • Use PDFs as source material for notes and flashcards
  • Ask the AI to summarize long documents

Performance

  • Keep PDF file sizes reasonable (under 50MB recommended)
  • Use high-quality scans for better OCR accuracy
  • Be patient with large multi-page documents during OCR
PDF text extraction uses Azure Document Intelligence, which provides high-quality OCR with support for complex layouts, tables, and multiple languages.

Build docs developers (and LLMs) love