Service map
Appwrite
Authentication (email/password + Google OAuth), file storage, and plan management via user labels
Qdrant
Vector database storing 768-dimensional embeddings for every document chunk, with cosine similarity search
Google Gemini
text-embedding-004 for embeddings, gemini-2.5-flash for chat, RAG responses, and image analysisData flows
Upload and indexing
When a user uploads a file, Prism runs the following pipeline:Store the file
The client calls
uploadDocument in lib/appwrite.ts, which writes the file to Appwrite Storage with permissions scoped to the uploading user (read, write, delete for user:{userId} only).Download and extract text
The
/api/documents/index route retrieves the file from Appwrite Storage using the server-side node-appwrite client. detectFileType identifies the format, then extractText in lib/document-processor.ts extracts the content — PDF via pdf2json, DOCX via mammoth, plain text for Markdown/TXT/code files, and Gemini Vision for images.Chunk the text
chunkText splits the extracted text into overlapping segments: maxChunkSize=1000 characters and overlap=200 characters. The splitter prefers sentence and paragraph boundaries over hard character cuts. Markdown files use an additional header-aware path (splitByHeaders=true).Generate embeddings
batchGenerateEmbeddings in lib/gemini.ts calls Gemini’s text-embedding-004 model on all chunks in parallel, processing up to 100 texts per batch. Each chunk produces a float[768] vector.Index into Qdrant
batchIndexChunks in lib/qdrant.ts validates that every embedding is exactly 768 dimensions, assigns each chunk a UUID point ID, and upserts the batch into the prism_documents collection. The payload stored alongside each vector includes documentId, chunkIndex, chunkText, documentName, documentType, userId, uploadDate, and category.Semantic search
Embed the query
The
/api/search/semantic route receives the user’s query string and calls generateEmbedding to produce a 768-dimensional vector via Gemini’s text-embedding-004.Search Qdrant
searchSimilarChunks queries the prism_documents collection filtered by userId (and optionally documentType), using cosine similarity with the query vector. The default score threshold for the search endpoint is 0.5.RAG chat
Embed the latest message
The
/api/chat route embeds the last user message with Gemini text-embedding-004.Retrieve context from Qdrant
searchSimilarChunks fetches up to 5 chunks for the user’s account with scoreThreshold=0.4. The lower threshold compared to standalone search ensures relevant context is included even when phrasing is indirect.Inject context into the prompt
Each retrieved chunk is prepended to the user message in the format
[Source N: {documentName}]\n{chunkText}. The full conversation history is preserved for multi-turn interactions.Stream the Gemini response
generateChatResponse streams the response token-by-token via generateContentStream. The route encodes each token as a Server-Sent Event (data: {"chunk": "..."}\n\n). Source metadata is emitted first as data: {"sources": [...]}\n\n so the client can render citations before the answer arrives.Collection structure
All vector data lives in a single Qdrant collection namedprism_documents.
| Parameter | Value |
|---|---|
| Vector size | 768 |
| Distance metric | Cosine |
| Indexing threshold | 10,000 points |
| Field | Type | Purpose |
|---|---|---|
userId | keyword | Scopes every search and delete to the document owner |
documentId | keyword | Enables chunk-level deletes when a file is removed |
documentType | keyword | Supports document type filtering in search |
category | keyword | Supports category-based filtering |
Document chunking
All text content passes throughchunkText in lib/document-processor.ts before embedding.
- Max chunk size: 1,000 characters
- Overlap: 200 characters (shared between adjacent chunks to preserve context across boundaries)
- Boundary detection: the splitter looks for the last sentence boundary (
.) or paragraph break (\n) within the target window before making a cut, so chunks rarely end mid-sentence - Markdown:
chunkMarkdownadds an extra header-aware pass that keeps section headings attached to their content
The overlap means consecutive chunks share 200 characters. This improves retrieval quality for queries that span a chunk boundary — both the preceding and following chunk contain enough context for the embedding model to produce a meaningful vector.