File uploads

Perplexica can analyze documents you upload and answer questions about their content. Upload PDFs, Word documents, or text files, then search across both the web and your files simultaneously.

Supported file types

Perplexica currently supports three file formats:

PDF

.pdf - Portable Document Format

Word

.docx - Microsoft Word documents

Text

.txt - Plain text files

How file uploads work

When you upload a file, Perplexica processes it in several steps:

File validation: Confirms the file type is supported
Text extraction: Extracts text content from the document
Content chunking: Splits the text into manageable chunks (512 tokens with 128 token overlap)
Embedding generation: Creates vector embeddings for semantic search
Storage: Saves the file and its processed content for future searches

File processing happens automatically when you upload. The content is embedded and ready to search immediately.

Uploading files

To upload a file:

Look for the file upload button in the search interface
Click to select one or more files from your device
Wait for the processing indicator to complete
The file is now available for searching

What happens during upload:

Each file receives a unique identifier
Original filename is preserved for display
Upload timestamp is recorded
Processed content is stored separately from the original

Asking questions about files

Once uploaded, you can ask questions about your documents: Example queries:

“Summarize the main points in this document”
“What does the report say about revenue growth?”
“Find all mentions of AI in my uploaded papers”
“Compare the conclusions from both PDFs”

Perplexica searches your uploaded files alongside web sources, combining information from both to answer your question.

File content is searched using semantic similarity, not just keyword matching. Ask questions naturally, and Perplexica will find relevant passages.

How file search works

Text extraction

Different file types are processed differently: PDF files:

Parsed using the pdf-parse library
Text is extracted from all pages
Preserves document structure where possible

Word documents (.docx):

Processed using the officeparser library
Extracts text content from the document
Handles formatted text and structure

Text files (.txt):

Read directly as UTF-8 text
No additional processing needed

Content chunking

Extracted text is split into chunks for efficient searching:

Chunk size: 512 tokens
Overlap: 128 tokens between chunks
Purpose: Maintains context across chunk boundaries

const splittedText = splitText(content, 512, 128)

Overlapping ensures that relevant information near chunk boundaries isn’t missed during search.

Embedding and storage

Each chunk is converted to a vector embedding:

const embeddings = await embeddingModel.embedText(splittedText)

The chunks and embeddings are stored together:

{
  "chunks": [
    {
      "content": "text content",
      "embedding": [0.123, -0.456, ...]
    }
  ]
}

This enables semantic search - finding content based on meaning rather than exact word matches.

Semantic search

When you ask a question about uploaded files:

Your query is converted to an embedding vector
Perplexica compares it against all file chunk embeddings
The most semantically similar chunks are retrieved
These chunks provide context for the AI to answer your question

Semantic search finds relevant content even if your question uses different words than the document. It understands meaning, not just keywords.

File management

Uploaded files are managed automatically: Storage location:

Files are stored in data/uploads/ directory
Each file gets a unique identifier
Processed content is saved alongside the original

File metadata:

{
  id: string;              // Unique file identifier
  name: string;            // Original filename
  filePath: string;        // Path to stored file
  contentPath: string;     // Path to processed content
  uploadedAt: string;      // ISO timestamp
}

Persistence:

Uploaded files remain available across sessions
File metadata is stored in uploaded_files.json
Files can be referenced in multiple chats

Privacy and security

Your uploaded files are completely private:

Files are processed and stored locally on your Perplexica instance
No file content is sent to external services
Embeddings are generated using your configured embedding model
Only you have access to your uploaded files

File content never leaves your Perplexica instance. All processing happens locally or with your configured AI providers.

Using files with search modes

Files work with all search modes:

Speed mode: Quick answers from file content
Balanced mode: Combines file and web search
Quality mode: Deep analysis of file content with comprehensive web research

The search mode determines how thoroughly the AI analyzes your files:

config: {
  sources: SearchSources[];
  fileIds: string[];      // Your uploaded files
  mode: 'speed' | 'balanced' | 'quality';
}

File size and limits

While there’s no hard limit on file size, consider:

Larger files take longer to process
More content means more chunks and embeddings
Very large files may affect search performance
Embedding generation time increases with content length

For best performance, consider splitting very large documents into smaller, topic-focused files.

Troubleshooting

File won’t upload:

Verify the file type is supported (.pdf, .docx, or .txt)
Check that the file isn’t corrupted
Ensure Perplexica has disk space for storage

Search not finding content:

Try rephrasing your question
Make sure the information actually exists in the file
Check that the file processed successfully

Slow processing:

Large files take longer to embed
Check your embedding model performance
Consider your system resources

Upcoming features

Support for additional file formats
Batch upload capabilities
File management interface
Search within specific files

Get Started

Core Features

Configuration

Deployment

Advanced

Supported file types

PDF

Word

Text

How file uploads work

Uploading files

Asking questions about files

How file search works

Text extraction

Content chunking

Embedding and storage

Semantic search

File management

Privacy and security

Using files with search modes

File size and limits

Troubleshooting

Upcoming features

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

Advanced

​Supported file types

PDF

Word

Text

​How file uploads work

​Uploading files

​Asking questions about files

​How file search works

​Text extraction

​Content chunking

​Embedding and storage

​Semantic search

​File management

​Privacy and security

​Using files with search modes

​File size and limits

​Troubleshooting

​Upcoming features

Build docs developers (and LLMs) love

Supported file types

How file uploads work

Uploading files

Asking questions about files

How file search works

Text extraction

Content chunking

Embedding and storage

Semantic search

File management

Privacy and security

Using files with search modes

File size and limits

Troubleshooting

Upcoming features