Supported file types
Perplexica currently supports three file formats:.pdf - Portable Document FormatWord
.docx - Microsoft Word documentsText
.txt - Plain text filesHow file uploads work
When you upload a file, Perplexica processes it in several steps:- File validation: Confirms the file type is supported
- Text extraction: Extracts text content from the document
- Content chunking: Splits the text into manageable chunks (512 tokens with 128 token overlap)
- Embedding generation: Creates vector embeddings for semantic search
- Storage: Saves the file and its processed content for future searches
File processing happens automatically when you upload. The content is embedded and ready to search immediately.
Uploading files
To upload a file:- Look for the file upload button in the search interface
- Click to select one or more files from your device
- Wait for the processing indicator to complete
- The file is now available for searching
- Each file receives a unique identifier
- Original filename is preserved for display
- Upload timestamp is recorded
- Processed content is stored separately from the original
Asking questions about files
Once uploaded, you can ask questions about your documents: Example queries:- “Summarize the main points in this document”
- “What does the report say about revenue growth?”
- “Find all mentions of AI in my uploaded papers”
- “Compare the conclusions from both PDFs”
How file search works
Text extraction
Different file types are processed differently: PDF files:- Parsed using the
pdf-parselibrary - Text is extracted from all pages
- Preserves document structure where possible
- Processed using the
officeparserlibrary - Extracts text content from the document
- Handles formatted text and structure
- Read directly as UTF-8 text
- No additional processing needed
Content chunking
Extracted text is split into chunks for efficient searching:- Chunk size: 512 tokens
- Overlap: 128 tokens between chunks
- Purpose: Maintains context across chunk boundaries
Embedding and storage
Each chunk is converted to a vector embedding:Semantic search
When you ask a question about uploaded files:- Your query is converted to an embedding vector
- Perplexica compares it against all file chunk embeddings
- The most semantically similar chunks are retrieved
- These chunks provide context for the AI to answer your question
Semantic search finds relevant content even if your question uses different words than the document. It understands meaning, not just keywords.
File management
Uploaded files are managed automatically: Storage location:- Files are stored in
data/uploads/directory - Each file gets a unique identifier
- Processed content is saved alongside the original
- Uploaded files remain available across sessions
- File metadata is stored in
uploaded_files.json - Files can be referenced in multiple chats
Privacy and security
Your uploaded files are completely private:- Files are processed and stored locally on your Perplexica instance
- No file content is sent to external services
- Embeddings are generated using your configured embedding model
- Only you have access to your uploaded files
File content never leaves your Perplexica instance. All processing happens locally or with your configured AI providers.
Using files with search modes
Files work with all search modes:- Speed mode: Quick answers from file content
- Balanced mode: Combines file and web search
- Quality mode: Deep analysis of file content with comprehensive web research
File size and limits
While there’s no hard limit on file size, consider:- Larger files take longer to process
- More content means more chunks and embeddings
- Very large files may affect search performance
- Embedding generation time increases with content length
Troubleshooting
File won’t upload:- Verify the file type is supported (.pdf, .docx, or .txt)
- Check that the file isn’t corrupted
- Ensure Perplexica has disk space for storage
- Try rephrasing your question
- Make sure the information actually exists in the file
- Check that the file processed successfully
- Large files take longer to embed
- Check your embedding model performance
- Consider your system resources
Upcoming features
- Support for additional file formats
- Batch upload capabilities
- File management interface
- Search within specific files