lib/gemini.ts using the @google/generative-ai SDK.
Models
Prism uses two Gemini models. Each is instantiated separately because they have different roles and different generation configurations.gemini-2.5-flash
Used for chat responses (streaming), RAG-grounded answers, image descriptions, and chat title generation
text-embedding-004
Used for embedding document chunks and search queries — produces 768-dimensional vectors
Chat model configuration
The chat model is initialized once at module level with fixed generation parameters:| Parameter | Value | Effect |
|---|---|---|
temperature | 0.7 | Balanced creativity — not deterministic, not erratic |
topP | 0.95 | Nucleus sampling; considers top 95% of probability mass |
topK | 40 | Limits each token selection to the 40 most probable tokens |
maxOutputTokens | 2048 | Hard ceiling on response length |
Streaming chat responses
generateChatResponse
ChatMessage type is { role: 'user' | 'model'; parts: string }. The function handles two cases:
Single-turn — if there is only one message (or only one user message after filtering), it calls model.generateContentStream directly with the message text.
Multi-turn — for conversations with history, it:
- Passes all but the last message as
historytomodel.startChat - Sends the latest message via
chat.sendMessageStream
onChunk is provided, it is called with each text fragment as it arrives. The complete response text is also returned as the resolved promise value.
The function filters the message list before building history: it keeps all
user messages and only model messages that directly follow a user message. This prevents malformed turn sequences from reaching the API.How the chat route uses streaming
The/api/chat route wraps generateChatResponse in a ReadableStream and returns it with Content-Type: text/event-stream. The SSE event format is:
RAG responses
generateRAGResponse
Embeddings
generateEmbedding
Converts a single text string into a 768-dimensional float vector:
batchGenerateEmbeddings
Converts an array of text strings into an array of 768-dimensional vectors:
chunkText splits a document. The function processes texts in batches of 100, running each batch in parallel with Promise.all:
Both embedding functions use
text-embedding-004, instantiated separately from the chat model. The embedding model has no generation configuration — it always returns exactly 768 dimensions.Image analysis
generateImageDescription
gemini-2.5-flash (vision) to generate a text description of the image, then embeds that description. The model receives the image as base64-encoded inline data alongside a structured prompt:
Retry logic
The function retries on HTTP503 (Service Unavailable) and 429 (Rate Limited) responses using exponential backoff:
| Attempt | Delay before retry |
|---|---|
| 1 → 2 | 2 seconds (2^1 × 1000ms) |
| 2 → 3 | 4 seconds (2^2 × 1000ms) |
| 3 (final) | Throws the error |
"Image file: {name}. Format: PNG. Uploaded on...") so the document is still indexed, just with reduced semantic richness.
Chat title generation
generateChatTitle
gemini-2.5-flash:
'New Chat' rather than throwing.