Skip to main content
RAG (Retrieval-Augmented Generation) chat combines the reasoning ability of Google Gemini with the specific content of your document library. Every answer is grounded in what you have actually uploaded, and Prism shows you exactly which documents it drew from.

Chat modes

Prism supports two chat modes:

RAG mode

Prism retrieves relevant chunks from your library before answering. Responses are grounded in your documents. Sources are shown alongside every reply. This is the default.

Standard mode

Prism answers using Gemini’s general knowledge only, without retrieving document context. Useful for general questions unrelated to your library.
You can toggle between modes using the RAG switch in the chat interface.

Starting a chat

Click AI Chat in the left sidebar. This opens a general chat session. Prism retrieves context from all documents in your library that are relevant to each question.

How RAG retrieval works

Each time you send a message in RAG mode, Prism runs the following sequence:
1

Embed your question

Your question is converted to a 768-dimensional vector using the Gemini text-embedding-004 model.
2

Search your library

Qdrant performs a cosine similarity search across all your indexed document chunks. Only chunks with a similarity score of 0.4 or higher are included as context.
3

Build the prompt

The top matching chunks (up to 5) are injected into the prompt, each labeled with its source document name. Gemini is instructed to answer based only on the provided context.
4

Stream the response

Gemini generates a response and streams it to you in real time via Server-Sent Events. The answer appears word by word as it is generated.
5

Show sources

Before the answer text, Prism lists the source documents used — with their name, type, and similarity score — so you can verify where information came from.

Source citations

When Prism retrieves context, a Sources panel appears above the response showing:
FieldDescription
Source indexNumbered reference (Source 1, Source 2, …)
Document nameThe filename of the source document
Document typeFile format (PDF, TS, PY, etc.)
Similarity scoreHow closely the chunk matched your question
Sources are ordered by relevance (highest score first).
If none of your documents contain context relevant to the question (no chunks meet the 0.4 threshold), Prism will tell you clearly that the context is insufficient and answer from Gemini’s general knowledge or decline to speculate.

Chat history

Every conversation is automatically saved. Prism generates a title for each session based on the first few messages using Gemini — you do not need to name sessions manually. Saved chats appear in the chat sidebar and can be reopened at any time.

Streaming responses

Responses are streamed using Server-Sent Events (SSE). Text appears progressively as Gemini generates it rather than waiting for the full response. The stream ends with a [DONE] signal. If an error occurs mid-stream, Prism surfaces the error inline.

Score threshold

The minimum similarity score for a document chunk to be included as RAG context is 0.4. This is intentionally lower than the default semantic search threshold (0.5) to allow broader context retrieval in conversation, trading some precision for recall.

Tips for better results

“What is the termination clause in the service agreement?” retrieves far more targeted context than “what does the contract say?”. Specific questions produce more focused embedding vectors.
If you know which document contains the answer, mention it: “According to the Q3 report, what was the revenue growth?”. This helps Gemini weigh the cited document more heavily in its answer.
Prism maintains conversation history within a session. Follow-up questions like “can you expand on that?” work because previous context is included.
If you want to ask something unrelated to your documents — such as a general coding question — toggle off RAG mode to avoid injecting irrelevant document context.

Build docs developers (and LLMs) love