Agentic AI Assistant: Streaming Edits and RAG Context

Caret’s AI assistant is not a chatbot bolted onto a word processor — it is woven into the editing experience itself. Triggered by a single keyboard shortcut and rendered as a 400px right-hand panel, the assistant reads your document in real time, streams responses token by token, and can propose full document edits that you review and accept or reject directly in the editor canvas. The AI uses the same orange accent color as the blinking caret because, in Caret’s design language, AI capability is simply the new normal for writing.

Why orange? Caret’s color palette has exactly two accents: blue for primary UI chrome and orange (accent-caret / accent-ai) for the user’s focus point — originally just the blinking cursor. AI features deliberately share this orange identity because AI assistance in Caret is native to the editing experience, not a separate product bolted on. There is no purple “AI” branding.

Opening the AI Panel

The ChatPanel component is lazy-loaded inside EditorPage using React’s Suspense. It mounts only when first needed, keeping the initial editor bundle lean.

Press Ctrl+K (Windows/Linux) or Cmd+K (macOS) anywhere in the editor to toggle the AI panel. The shortcut is a global keyboard listener registered in EditorPage.

The panel is a 400px fixed right sidebar (w-[400px], z-40) that renders alongside the document canvas. When open, the editor toolbar reflows to stay centered over the writing area.

Interaction Modes

The AI panel exposes two interaction modes, selectable from the mode picker in the panel footer:

Ask

Single-turn question and answer. The assistant responds in chat without using tools or proposing document edits. Best for questions about the document, brainstorming, and feedback you’re not ready to apply yet.

Agent

Multi-step agentic mode with full tool access. The agent reads the document, calls deterministic metric tools, proposes edits, and searches workspace context — all in a single response turn.

Agent Types

When Agent mode is selected, you can choose between two specialized agent personalities:

General Agent

The general agent is the default writing assistant. It is optimized for document editing and metric computation. Available tools:

Tool	Purpose
`get_document_content`	Reads the current document’s plain-text content
`get_selection_content`	Reads the active editor selection
`propose_document_replacement`	Queues a full-document replacement for review
`search_workspace_context`	Retrieves semantically related chunks from workspace RAG
`count_words`	Deterministic word count from the document snapshot
`count_characters`	Character count with and without spaces
`count_paragraphs`	Counts non-empty paragraph blocks
`count_sentences`	Counts sentence spans using punctuation boundaries
`estimate_reading_time`	Estimates reading time from the current word count

The general agent defaults to an edit intent: when a request is ambiguous, it treats it as a document modification rather than a chat response. Explicit metric requests trigger the relevant metric tool automatically.

Analyst Agent

The analyst agent specializes in document analysis, summarization, and structural improvement. Available tools:

Tool	Purpose
`get_document_content`	Reads the full document text before any analysis
`propose_document_replacement`	Proposes structural reorganizations
`search_workspace_context`	Finds related content from other documents in the workspace

Analyst capabilities:

Generates 2–3 sentence executive summaries with key topics and conclusions
Analyzes section hierarchy, logical flow, and thematic coherence
Identifies underdeveloped topics and missing sections
Proposes structural reorganizations via propose_document_replacement

SSE Streaming

AI responses stream to the frontend in real time using Server-Sent Events (SSE).

POST /api/v1/ai/conversations/{conversation_id}/stream
Content-Type: application/json

{
  "message": "Rewrite the introduction to be more concise.",
  "document_context": { ... },
  "agent_type": "general"
}

Each SSE event has the shape:

data: {"type": "delta", "content": "Here is"}
data: {"type": "delta", "content": " a revised"}
data: {"type": "done", "content": ""}

The backend (ai_router.py) streams directly from the PydanticAI agent run. The Cache-Control: no-cache and X-Accel-Buffering: no response headers ensure chunks reach the browser without proxy buffering.

AI streaming is routed through the API Gateway at /api/v1/ai/.... Do not attempt to connect directly to the AI service on port 8000 from the frontend in production — the gateway handles authentication, rate limiting, and CORS.

Multi-Provider LLM Support

The AI service supports multiple LLM providers. The active model is resolved per request from a curated catalog.

GET /api/v1/ai/models

Returns the list of available models and the server’s default model ID. No authentication is required for this endpoint. Providers are configured via environment variables:

Variable	Purpose
`OPENROUTER_API_KEY`	OpenRouter — multi-model gateway (primary)
`OPENAI_API_KEY`	Direct OpenAI models
Anthropic keys	Direct Anthropic models

Each model entry in the catalog carries id, name, provider, gateway, is_free, context_window, and description fields.

RAG — Workspace Context Retrieval

Caret keeps an up-to-date semantic index of every document in your workspace so the AI can retrieve relevant context before responding.

Indexing on save

After every successful autosave, EditorPage calls indexDocumentEmbeddings(document_id, text) via the AI API helper. The AI service chunks the document text and stores embeddings in the document_embeddings table using pgvector.

HNSW cosine search

When an agent calls search_workspace_context, the AI service runs a pgvector HNSW cosine-similarity search against the workspace’s stored embeddings to retrieve the most relevant chunks.

Context injection

Retrieved chunks are injected into the agent’s context window before the LLM generates its response, grounding answers in your actual documents.

Graceful degradation

RAG is designed to degrade gracefully. If embeddings are missing (e.g., on a brand-new document), the agent proceeds with the live document context provided directly in the request.

Embedding API endpoints:

POST /api/v1/ai/embeddings/index    — Index document text after a save
GET  /api/v1/ai/embeddings/search   — Semantic search (used internally by agents)

AI Suggestions Lifecycle

When the general or analyst agent calls propose_document_replacement, the proposed change enters a structured review lifecycle:

Proposed

The agent appends the proposal to proposed_changes. The service layer streams a document_change SSE event to the frontend.

Displayed

EditorPage receives the proposal and renders the DocumentChangeReviewOverlay — a floating card over the editor canvas showing a git-style line diff with added (+) and removed (-) line counts.

Accepted (applied)

The user clicks Accept. The proposed text is converted to Tiptap JSON via convert_ai_content_to_tiptap_json and applied through editor.commands.setContent() or — when collaboration is active — written directly into the shared Y.Doc. The suggestion status is updated to applied via updateSuggestionStatus.

Rejected (dismissed)

The user clicks Reject. The overlay closes, the document is unchanged, and the suggestion status is updated to dismissed.

AI never writes directly to the database. All document changes proposed by the AI flow through Tiptap transactions so that Y.js CRDT sync remains consistent for all collaborators. The AI service only writes to its own ai_suggestions table to track suggestion status — never to the document content tables.

Conversation Persistence

AI conversations are fully persisted so you can return to previous exchanges.

Table	Contents
`ai_conversations`	One row per conversation, scoped to a document and user
`ai_messages`	All user and assistant messages, ordered by creation time

Conversation API endpoints:

POST   /api/v1/ai/conversations                         — Create a new conversation
GET    /api/v1/ai/conversations?document_id={uuid}      — List conversations for a document
GET    /api/v1/ai/conversations/{id}/messages            — List messages in a conversation
DELETE /api/v1/ai/conversations/{id}                    — Delete a conversation

The panel’s history picker lets you search and switch between past conversations for the current document. Starting a new conversation via the refresh button deletes the current conversation and resets the message list.

Tool Call Transparency

In Agent mode, every tool call the agent makes is surfaced in the chat panel as an inline trace:

A spinner (LoaderCircle) appears while the tool is running
A checkmark (Check) appears when the tool completes
The trace shows the tool’s category label, status, and any numeric result (e.g., 842 words)
Multi-tool traces are collapsible so the chat doesn’t get cluttered

Tool trace copy is localized for English (en) and Spanish (es) locales.

Supported tool trace displays

Tool	Pending label	Completed label
`get_document_content`	Reading document…	Read document
`get_selection_content`	Reading selection…	Read selection
`count_words`	Counting words…	Counted words
`count_characters`	Counting characters…	Counted characters
`count_paragraphs`	Counting paragraphs…	Counted paragraphs
`count_sentences`	Counting sentences…	Counted sentences
`estimate_reading_time`	Estimating reading time…	Estimated reading time
`propose_document_replacement`	Preparing edit…	Prepared edit

Model Reasoning Display

For models that emit <think>...</think> reasoning blocks (such as DeepSeek-R1), the panel renders a collapsible ThinkBlock above the main response. In Agent mode the thought block defaults to open; in Ask mode it defaults to collapsed. This keeps the reasoning transparent without cluttering the conversation.

Get Started

Core Features

Architecture

Development

Deployment

Agentic AI Assistant: Streaming Edits and RAG Context

Opening the AI Panel

Interaction Modes

Ask

Agent

Agent Types

General Agent

Analyst Agent

SSE Streaming

Multi-Provider LLM Support

RAG — Workspace Context Retrieval

AI Suggestions Lifecycle

Conversation Persistence

Tool Call Transparency

Model Reasoning Display

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Development

Deployment

Documentation Index

​Opening the AI Panel

​Interaction Modes

Ask

Agent

​Agent Types

​General Agent

​Analyst Agent

​SSE Streaming

​Multi-Provider LLM Support

​RAG — Workspace Context Retrieval

​AI Suggestions Lifecycle

​Conversation Persistence

​Tool Call Transparency

​Model Reasoning Display

Build docs developers (and LLMs) love

Opening the AI Panel

Interaction Modes

Agent Types

General Agent

Analyst Agent

SSE Streaming

Multi-Provider LLM Support

RAG — Workspace Context Retrieval

AI Suggestions Lifecycle

Conversation Persistence

Tool Call Transparency

Model Reasoning Display