Chat modes
Prism supports two chat modes:RAG mode
Prism retrieves relevant chunks from your library before answering. Responses are grounded in your documents. Sources are shown alongside every reply. This is the default.
Standard mode
Prism answers using Gemini’s general knowledge only, without retrieving document context. Useful for general questions unrelated to your library.
Starting a chat
- From a document
How RAG retrieval works
Each time you send a message in RAG mode, Prism runs the following sequence:Embed your question
Your question is converted to a 768-dimensional vector using the Gemini
text-embedding-004 model.Search your library
Qdrant performs a cosine similarity search across all your indexed document chunks. Only chunks with a similarity score of 0.4 or higher are included as context.
Build the prompt
The top matching chunks (up to 5) are injected into the prompt, each labeled with its source document name. Gemini is instructed to answer based only on the provided context.
Stream the response
Gemini generates a response and streams it to you in real time via Server-Sent Events. The answer appears word by word as it is generated.
Source citations
When Prism retrieves context, a Sources panel appears above the response showing:| Field | Description |
|---|---|
| Source index | Numbered reference (Source 1, Source 2, …) |
| Document name | The filename of the source document |
| Document type | File format (PDF, TS, PY, etc.) |
| Similarity score | How closely the chunk matched your question |
If none of your documents contain context relevant to the question (no chunks meet the 0.4 threshold), Prism will tell you clearly that the context is insufficient and answer from Gemini’s general knowledge or decline to speculate.
Chat history
Every conversation is automatically saved. Prism generates a title for each session based on the first few messages using Gemini — you do not need to name sessions manually. Saved chats appear in the chat sidebar and can be reopened at any time.Streaming responses
Responses are streamed using Server-Sent Events (SSE). Text appears progressively as Gemini generates it rather than waiting for the full response. The stream ends with a[DONE] signal. If an error occurs mid-stream, Prism surfaces the error inline.
Score threshold
The minimum similarity score for a document chunk to be included as RAG context is 0.4. This is intentionally lower than the default semantic search threshold (0.5) to allow broader context retrieval in conversation, trading some precision for recall.Tips for better results
Ask specific questions
Ask specific questions
“What is the termination clause in the service agreement?” retrieves far more targeted context than “what does the contract say?”. Specific questions produce more focused embedding vectors.
Reference document names
Reference document names
If you know which document contains the answer, mention it: “According to the Q3 report, what was the revenue growth?”. This helps Gemini weigh the cited document more heavily in its answer.
Follow up in the same session
Follow up in the same session
Prism maintains conversation history within a session. Follow-up questions like “can you expand on that?” work because previous context is included.
Switch to standard mode for general questions
Switch to standard mode for general questions
If you want to ask something unrelated to your documents — such as a general coding question — toggle off RAG mode to avoid injecting irrelevant document context.