Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/arrozet/caret/llms.txt

Use this file to discover all available pages before exploring further.

Caret’s AI assistant becomes significantly more useful when it can reference the actual content of documents in your workspace. The Embeddings API powers this by splitting a document’s plain text into overlapping chunks, generating a vector embedding for each chunk using the configured LLM provider’s embedding model, and storing those vectors in a pgvector column (vector(1536)) in the document_embeddings table. When a user asks the AI a question, the agent queries this table with HNSW cosine-similarity search to find the most relevant chunks and injects them into its system prompt — a pattern known as Retrieval-Augmented Generation (RAG). The endpoint is safe to call on every document save: existing embeddings for the document are atomically replaced, so there is no risk of stale or duplicate chunks accumulating over time. All endpoints require a valid Supabase JWT and are proxied through the API Gateway at https://api.caret.page/api/v1/ai/....

How RAG works in Caret

Save document


POST /api/v1/ai/embeddings/index
      │  chunks document text
      │  embeds each chunk (vector 1536-d)
      │  upserts into document_embeddings

User sends message in AI panel


POST /api/v1/ai/conversations/{id}/stream
      │  agent calls search_workspace_context tool
      │  pgvector HNSW cosine-similarity search
      │  top-k chunks injected into system prompt

LLM generates response grounded in document content
When document_id is included in the stream request body, the AI agent automatically calls the search_workspace_context internal tool, which executes a cosine-similarity search over the workspace’s document_embeddings rows and prepends the most relevant passages to the system prompt before the LLM generates its reply.
Embeddings are workspace-scoped. The similarity search is bounded to documents that belong to the same workspace as the target document, and only documents the authenticated user has access to are indexed. The AI will never surface content from a document in a different workspace or a document the user cannot read.

POST /api/v1/ai/embeddings/index

Index or re-index a document’s embeddings. Send the full plain-text content of the document; the service handles chunking, embedding, and storage automatically.
curl -X POST https://api.caret.page/api/v1/ai/embeddings/index \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "content": "Q3 performance exceeded expectations across all product lines..."
  }'

Request body

document_id
string
required
UUID of the document to index. The authenticated user must have access to this document; a 403 is returned if they do not, and 404 if the document does not exist.
content
string
required
Plain-text content of the document. Minimum 1 character, maximum 500,000 characters. Strip Tiptap/ProseMirror JSON before sending — only raw text is accepted. The document service exposes a content_text field on every document response that is suitable for direct use here.

What happens internally

  1. The service deletes all existing document_embeddings rows for document_id.
  2. The content is split into overlapping fixed-size chunks (with a configurable stride to preserve context across boundaries).
  3. Each chunk is passed to the configured embedding model (OpenAI text-embedding-ada-002 or equivalent) to produce a 1536-dimensional float vector.
  4. Each chunk row is inserted into document_embeddings with the document_id, workspace_id (resolved from the document record), chunk_index, chunk_text, and embedding vector(1536).
  5. The response reports how many chunks were stored.

Response — 200 OK

document_id
string
required
UUID of the indexed document (echoed from the request).
chunks_indexed
integer
required
Number of embedding chunks stored in document_embeddings. A short document might produce a single chunk; a long one could produce dozens.
Example response
{
  "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "chunks_indexed": 14
}

Error responses

StatusCondition
404 Not FoundThe document_id does not exist or the workspace could not be resolved.
403 ForbiddenThe authenticated user does not have access to the document.
422 Unprocessable EntityRequest body failed validation (e.g. content is empty or exceeds 500,000 characters).

POST /api/v1/ai/embeddings/search

Run a semantic similarity search over the embedding chunks stored for a workspace. The service embeds the query string and returns the top-k most similar chunks ranked by cosine similarity. This is the same search the AI agent executes internally via the search_workspace_context tool; it is exposed as a public endpoint for custom integrations.
curl -X POST https://api.caret.page/api/v1/ai/embeddings/search \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Q3 adoption metrics",
    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "top_k": 5
  }'

Request body

query
string
required
The search query to embed. Minimum 1 character, maximum 2,000 characters.
document_id
string
required
UUID of any document in the target workspace. Used to resolve the workspace scope — the search covers all documents in the same workspace, not just this document. The authenticated user must have access to this document.
top_k
integer
Maximum number of chunks to return. Defaults to 5. Minimum 1, maximum 20.
exclude_current_document
boolean
When true, chunks from document_id itself are excluded — only chunks from other workspace documents are returned. Defaults to false.

Response — 200 OK

document_id
string
required
UUID of the document used to scope the workspace search (echoed from the request).
query
string
required
The search query (echoed from the request).
results
array
required
Ranked list of matching chunks.
Example response
{
  "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "query": "Q3 adoption metrics",
  "results": [
    {
      "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "workspace_id": "ws111222-3344-5566-7788-99aabbccddee",
      "chunk_index": 2,
      "chunk_text": "Q3 performance exceeded expectations across all product lines...",
      "document_title": "Q3 Report",
      "is_current_document": true,
      "score": 0.92
    }
  ]
}

Error responses

StatusCondition
404 Not FoundThe document_id does not exist.
403 ForbiddenThe authenticated user does not have access to the document.
422 Unprocessable EntityRequest body failed validation.

DELETE /api/v1/ai/embeddings/

Hard-delete all stored embedding chunks for a document. After this call the document will no longer appear in similarity searches until it is re-indexed via POST /api/v1/ai/embeddings/index. This is called automatically when a document is permanently deleted.
curl -X DELETE \
  "https://api.caret.page/api/v1/ai/embeddings/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -H "Authorization: Bearer <token>"

Path parameters

document_id
string
required
UUID of the document whose embeddings to delete. The authenticated user must have access to this document.

Response — 204 No Content

Returns an empty body on success.

Error responses

StatusCondition
404 Not FoundThe document_id does not exist.
403 ForbiddenThe authenticated user does not have access to the document.

Frontend integration pattern

The Caret frontend automatically triggers embedding indexing after a successful document save. A typical integration looks like this:
async function indexDocumentEmbeddings(
  documentId: string,
  contentText: string,
  token: string,
): Promise<void> {
  if (!contentText || contentText.trim().length === 0) {
    // Nothing to index — skip silently
    return;
  }

  const response = await fetch(
    "https://api.caret.page/api/v1/ai/embeddings/index",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        document_id: documentId,
        content: contentText,
      }),
    },
  );

  if (!response.ok) {
    // Embedding indexing is non-critical — log the failure but don't
    // surface it to the user. RAG will degrade gracefully until the
    // next successful save re-indexes the document.
    console.warn("Embedding indexing failed:", await response.json());
    return;
  }

  const { chunks_indexed } = await response.json();
  console.debug(`Indexed ${chunks_indexed} chunks for document ${documentId}`);
}

// Call after a successful document PATCH
async function onDocumentSaved(document: Document, token: string): Promise<void> {
  await saveDocumentToApi(document, token);
  // Fire-and-forget — don't await in the critical path
  indexDocumentEmbeddings(document.id, document.content_text ?? "", token).catch(
    (err) => console.warn("Background embedding indexing error:", err),
  );
}
Fire the embedding index request in the background (fire-and-forget) after saving. If the embedding call fails, the document is still saved correctly and RAG will use the previously indexed chunks until the next successful save triggers a fresh index run.

Workspace scoping in detail

Every row in document_embeddings carries both document_id and workspace_id. When the AI agent calls search_workspace_context during a conversation, the similarity search filters by workspace_id — meaning it searches all documents in the workspace, not just the one currently open. This lets the AI answer cross-document questions like “What did the Q2 retrospective say about the design team?” as long as all relevant documents have been indexed. To scope search results to only the current document, the agent can pass exclude_current_document: false (the default) to include the open document’s own chunks, or true to retrieve context exclusively from other workspace documents.

Build docs developers (and LLMs) love