RAG chat

RAG (Retrieval-Augmented Generation) chat combines the reasoning ability of Google Gemini with the specific content of your document library. Every answer is grounded in what you have actually uploaded, and Prism shows you exactly which documents it drew from.

Chat modes

Prism supports two chat modes:

RAG mode

Prism retrieves relevant chunks from your library before answering. Responses are grounded in your documents. Sources are shown alongside every reply. This is the default.

Standard mode

Prism answers using Gemini’s general knowledge only, without retrieving document context. Useful for general questions unrelated to your library.

You can toggle between modes using the RAG switch in the chat interface.

Starting a chat

How RAG retrieval works

Each time you send a message in RAG mode, Prism runs the following sequence:

Embed your question

Your question is converted to a 768-dimensional vector using the Gemini text-embedding-004 model.

Search your library

Qdrant performs a cosine similarity search across all your indexed document chunks. Only chunks with a similarity score of 0.4 or higher are included as context.

Build the prompt

The top matching chunks (up to 5) are injected into the prompt, each labeled with its source document name. Gemini is instructed to answer based only on the provided context.

Stream the response

Gemini generates a response and streams it to you in real time via Server-Sent Events. The answer appears word by word as it is generated.

Show sources

Before the answer text, Prism lists the source documents used — with their name, type, and similarity score — so you can verify where information came from.

Source citations

When Prism retrieves context, a Sources panel appears above the response showing:

Field	Description
Source index	Numbered reference (Source 1, Source 2, …)
Document name	The filename of the source document
Document type	File format (PDF, TS, PY, etc.)
Similarity score	How closely the chunk matched your question

Sources are ordered by relevance (highest score first).

If none of your documents contain context relevant to the question (no chunks meet the 0.4 threshold), Prism will tell you clearly that the context is insufficient and answer from Gemini’s general knowledge or decline to speculate.

Chat history

Every conversation is automatically saved. Prism generates a title for each session based on the first few messages using Gemini — you do not need to name sessions manually. Saved chats appear in the chat sidebar and can be reopened at any time.

Streaming responses

Responses are streamed using Server-Sent Events (SSE). Text appears progressively as Gemini generates it rather than waiting for the full response. The stream ends with a [DONE] signal. If an error occurs mid-stream, Prism surfaces the error inline.

Score threshold

The minimum similarity score for a document chunk to be included as RAG context is 0.4. This is intentionally lower than the default semantic search threshold (0.5) to allow broader context retrieval in conversation, trading some precision for recall.

Tips for better results

Ask specific questions

“What is the termination clause in the service agreement?” retrieves far more targeted context than “what does the contract say?”. Specific questions produce more focused embedding vectors.

Reference document names

If you know which document contains the answer, mention it: “According to the Q3 report, what was the revenue growth?”. This helps Gemini weigh the cited document more heavily in its answer.

Follow up in the same session

Prism maintains conversation history within a session. Follow-up questions like “can you expand on that?” work because previous context is included.

Switch to standard mode for general questions

If you want to ask something unrelated to your documents — such as a general coding question — toggle off RAG mode to avoid injecting irrelevant document context.

Get Started

Core Features

Authentication & Plans

Architecture

Chat modes

RAG mode

Standard mode

Starting a chat

How RAG retrieval works

Source citations

Chat history

Streaming responses

Score threshold

Tips for better results

Build docs developers (and LLMs) love

Get Started

Core Features

Authentication & Plans

Architecture

​Chat modes

RAG mode

Standard mode

​Starting a chat

​How RAG retrieval works

​Source citations

​Chat history

​Streaming responses

​Score threshold

​Tips for better results

Build docs developers (and LLMs) love

Chat modes

Starting a chat

How RAG retrieval works

Source citations

Chat history

Streaming responses

Score threshold

Tips for better results