Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/arrozet/caret/llms.txt

Use this file to discover all available pages before exploring further.

Caret’s AI assistant is not a chatbot bolted onto a word processor — it is woven into the editing experience itself. Triggered by a single keyboard shortcut and rendered as a 400px right-hand panel, the assistant reads your document in real time, streams responses token by token, and can propose full document edits that you review and accept or reject directly in the editor canvas. The AI uses the same orange accent color as the blinking caret because, in Caret’s design language, AI capability is simply the new normal for writing.
Why orange? Caret’s color palette has exactly two accents: blue for primary UI chrome and orange (accent-caret / accent-ai) for the user’s focus point — originally just the blinking cursor. AI features deliberately share this orange identity because AI assistance in Caret is native to the editing experience, not a separate product bolted on. There is no purple “AI” branding.

Opening the AI Panel

The ChatPanel component is lazy-loaded inside EditorPage using React’s Suspense. It mounts only when first needed, keeping the initial editor bundle lean.
Press Ctrl+K (Windows/Linux) or Cmd+K (macOS) anywhere in the editor to toggle the AI panel. The shortcut is a global keyboard listener registered in EditorPage.
The panel is a 400px fixed right sidebar (w-[400px], z-40) that renders alongside the document canvas. When open, the editor toolbar reflows to stay centered over the writing area.

Interaction Modes

The AI panel exposes two interaction modes, selectable from the mode picker in the panel footer:

Ask

Single-turn question and answer. The assistant responds in chat without using tools or proposing document edits. Best for questions about the document, brainstorming, and feedback you’re not ready to apply yet.

Agent

Multi-step agentic mode with full tool access. The agent reads the document, calls deterministic metric tools, proposes edits, and searches workspace context — all in a single response turn.

Agent Types

When Agent mode is selected, you can choose between two specialized agent personalities:

General Agent

The general agent is the default writing assistant. It is optimized for document editing and metric computation. Available tools:
ToolPurpose
get_document_contentReads the current document’s plain-text content
get_selection_contentReads the active editor selection
propose_document_replacementQueues a full-document replacement for review
search_workspace_contextRetrieves semantically related chunks from workspace RAG
count_wordsDeterministic word count from the document snapshot
count_charactersCharacter count with and without spaces
count_paragraphsCounts non-empty paragraph blocks
count_sentencesCounts sentence spans using punctuation boundaries
estimate_reading_timeEstimates reading time from the current word count
The general agent defaults to an edit intent: when a request is ambiguous, it treats it as a document modification rather than a chat response. Explicit metric requests trigger the relevant metric tool automatically.

Analyst Agent

The analyst agent specializes in document analysis, summarization, and structural improvement. Available tools:
ToolPurpose
get_document_contentReads the full document text before any analysis
propose_document_replacementProposes structural reorganizations
search_workspace_contextFinds related content from other documents in the workspace
Analyst capabilities:
  • Generates 2–3 sentence executive summaries with key topics and conclusions
  • Analyzes section hierarchy, logical flow, and thematic coherence
  • Identifies underdeveloped topics and missing sections
  • Proposes structural reorganizations via propose_document_replacement

SSE Streaming

AI responses stream to the frontend in real time using Server-Sent Events (SSE).
POST /api/v1/ai/conversations/{conversation_id}/stream
Content-Type: application/json

{
  "message": "Rewrite the introduction to be more concise.",
  "document_context": { ... },
  "agent_type": "general"
}
Each SSE event has the shape:
data: {"type": "delta", "content": "Here is"}
data: {"type": "delta", "content": " a revised"}
data: {"type": "done", "content": ""}
The backend (ai_router.py) streams directly from the PydanticAI agent run. The Cache-Control: no-cache and X-Accel-Buffering: no response headers ensure chunks reach the browser without proxy buffering.
AI streaming is routed through the API Gateway at /api/v1/ai/.... Do not attempt to connect directly to the AI service on port 8000 from the frontend in production — the gateway handles authentication, rate limiting, and CORS.

Multi-Provider LLM Support

The AI service supports multiple LLM providers. The active model is resolved per request from a curated catalog.
GET /api/v1/ai/models
Returns the list of available models and the server’s default model ID. No authentication is required for this endpoint. Providers are configured via environment variables:
VariablePurpose
OPENROUTER_API_KEYOpenRouter — multi-model gateway (primary)
OPENAI_API_KEYDirect OpenAI models
Anthropic keysDirect Anthropic models
Each model entry in the catalog carries id, name, provider, gateway, is_free, context_window, and description fields.

RAG — Workspace Context Retrieval

Caret keeps an up-to-date semantic index of every document in your workspace so the AI can retrieve relevant context before responding.
1

Indexing on save

After every successful autosave, EditorPage calls indexDocumentEmbeddings(document_id, text) via the AI API helper. The AI service chunks the document text and stores embeddings in the document_embeddings table using pgvector.
2

HNSW cosine search

When an agent calls search_workspace_context, the AI service runs a pgvector HNSW cosine-similarity search against the workspace’s stored embeddings to retrieve the most relevant chunks.
3

Context injection

Retrieved chunks are injected into the agent’s context window before the LLM generates its response, grounding answers in your actual documents.
4

Graceful degradation

RAG is designed to degrade gracefully. If embeddings are missing (e.g., on a brand-new document), the agent proceeds with the live document context provided directly in the request.
Embedding API endpoints:
POST /api/v1/ai/embeddings/index    — Index document text after a save
GET  /api/v1/ai/embeddings/search   — Semantic search (used internally by agents)

AI Suggestions Lifecycle

When the general or analyst agent calls propose_document_replacement, the proposed change enters a structured review lifecycle:
1

Proposed

The agent appends the proposal to proposed_changes. The service layer streams a document_change SSE event to the frontend.
2

Displayed

EditorPage receives the proposal and renders the DocumentChangeReviewOverlay — a floating card over the editor canvas showing a git-style line diff with added (+) and removed (-) line counts.
3

Accepted (applied)

The user clicks Accept. The proposed text is converted to Tiptap JSON via convert_ai_content_to_tiptap_json and applied through editor.commands.setContent() or — when collaboration is active — written directly into the shared Y.Doc. The suggestion status is updated to applied via updateSuggestionStatus.
4

Rejected (dismissed)

The user clicks Reject. The overlay closes, the document is unchanged, and the suggestion status is updated to dismissed.
AI never writes directly to the database. All document changes proposed by the AI flow through Tiptap transactions so that Y.js CRDT sync remains consistent for all collaborators. The AI service only writes to its own ai_suggestions table to track suggestion status — never to the document content tables.

Conversation Persistence

AI conversations are fully persisted so you can return to previous exchanges.
TableContents
ai_conversationsOne row per conversation, scoped to a document and user
ai_messagesAll user and assistant messages, ordered by creation time
Conversation API endpoints:
POST   /api/v1/ai/conversations                         — Create a new conversation
GET    /api/v1/ai/conversations?document_id={uuid}      — List conversations for a document
GET    /api/v1/ai/conversations/{id}/messages            — List messages in a conversation
DELETE /api/v1/ai/conversations/{id}                    — Delete a conversation
The panel’s history picker lets you search and switch between past conversations for the current document. Starting a new conversation via the refresh button deletes the current conversation and resets the message list.

Tool Call Transparency

In Agent mode, every tool call the agent makes is surfaced in the chat panel as an inline trace:
  • A spinner (LoaderCircle) appears while the tool is running
  • A checkmark (Check) appears when the tool completes
  • The trace shows the tool’s category label, status, and any numeric result (e.g., 842 words)
  • Multi-tool traces are collapsible so the chat doesn’t get cluttered
Tool trace copy is localized for English (en) and Spanish (es) locales.
ToolPending labelCompleted label
get_document_contentReading document…Read document
get_selection_contentReading selection…Read selection
count_wordsCounting words…Counted words
count_charactersCounting characters…Counted characters
count_paragraphsCounting paragraphs…Counted paragraphs
count_sentencesCounting sentences…Counted sentences
estimate_reading_timeEstimating reading time…Estimated reading time
propose_document_replacementPreparing edit…Prepared edit

Model Reasoning Display

For models that emit <think>...</think> reasoning blocks (such as DeepSeek-R1), the panel renders a collapsible ThinkBlock above the main response. In Agent mode the thought block defaults to open; in Ask mode it defaults to collapsed. This keeps the reasoning transparent without cluttering the conversation.

Build docs developers (and LLMs) love