Embeddings API: Index Document Context for RAG Retrieval

Caret’s AI assistant becomes significantly more useful when it can reference the actual content of documents in your workspace. The Embeddings API powers this by splitting a document’s plain text into overlapping chunks, generating a vector embedding for each chunk using the configured LLM provider’s embedding model, and storing those vectors in a pgvector column (vector(1536)) in the document_embeddings table. When a user asks the AI a question, the agent queries this table with HNSW cosine-similarity search to find the most relevant chunks and injects them into its system prompt — a pattern known as Retrieval-Augmented Generation (RAG). The endpoint is safe to call on every document save: existing embeddings for the document are atomically replaced, so there is no risk of stale or duplicate chunks accumulating over time. All endpoints require a valid Supabase JWT and are proxied through the API Gateway at https://api.caret.page/api/v1/ai/....

How RAG works in Caret

Save document
      │
      ▼
POST /api/v1/ai/embeddings/index
      │  chunks document text
      │  embeds each chunk (vector 1536-d)
      │  upserts into document_embeddings
      ▼
User sends message in AI panel
      │
      ▼
POST /api/v1/ai/conversations/{id}/stream
      │  agent calls search_workspace_context tool
      │  pgvector HNSW cosine-similarity search
      │  top-k chunks injected into system prompt
      ▼
LLM generates response grounded in document content

When document_id is included in the stream request body, the AI agent automatically calls the search_workspace_context internal tool, which executes a cosine-similarity search over the workspace’s document_embeddings rows and prepends the most relevant passages to the system prompt before the LLM generates its reply.

Embeddings are workspace-scoped. The similarity search is bounded to documents that belong to the same workspace as the target document, and only documents the authenticated user has access to are indexed. The AI will never surface content from a document in a different workspace or a document the user cannot read.

POST /api/v1/ai/embeddings/index

Index or re-index a document’s embeddings. Send the full plain-text content of the document; the service handles chunking, embedding, and storage automatically.

curl -X POST https://api.caret.page/api/v1/ai/embeddings/index \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "content": "Q3 performance exceeded expectations across all product lines..."
  }'

Request body

document_id

string

required

UUID of the document to index. The authenticated user must have access to this document; a 403 is returned if they do not, and 404 if the document does not exist.

content

string

required

Plain-text content of the document. Minimum 1 character, maximum 500,000 characters. Strip Tiptap/ProseMirror JSON before sending — only raw text is accepted. The document service exposes a content_text field on every document response that is suitable for direct use here.

What happens internally

The service deletes all existing document_embeddings rows for document_id.
The content is split into overlapping fixed-size chunks (with a configurable stride to preserve context across boundaries).
Each chunk is passed to the configured embedding model (OpenAI text-embedding-ada-002 or equivalent) to produce a 1536-dimensional float vector.
Each chunk row is inserted into document_embeddings with the document_id, workspace_id (resolved from the document record), chunk_index, chunk_text, and embedding vector(1536).
The response reports how many chunks were stored.

Response — `200 OK`

document_id

string

required

UUID of the indexed document (echoed from the request).

chunks_indexed

integer

required

Number of embedding chunks stored in document_embeddings. A short document might produce a single chunk; a long one could produce dozens.

Example response

{
  "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "chunks_indexed": 14
}

Error responses

Status	Condition
`404 Not Found`	The `document_id` does not exist or the workspace could not be resolved.
`403 Forbidden`	The authenticated user does not have access to the document.
`422 Unprocessable Entity`	Request body failed validation (e.g. `content` is empty or exceeds 500,000 characters).

POST /api/v1/ai/embeddings/search

Run a semantic similarity search over the embedding chunks stored for a workspace. The service embeds the query string and returns the top-k most similar chunks ranked by cosine similarity. This is the same search the AI agent executes internally via the search_workspace_context tool; it is exposed as a public endpoint for custom integrations.

curl -X POST https://api.caret.page/api/v1/ai/embeddings/search \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Q3 adoption metrics",
    "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "top_k": 5
  }'

Request body

query

string

required

The search query to embed. Minimum 1 character, maximum 2,000 characters.

document_id

string

required

UUID of any document in the target workspace. Used to resolve the workspace scope — the search covers all documents in the same workspace, not just this document. The authenticated user must have access to this document.

top_k

integer

Maximum number of chunks to return. Defaults to 5. Minimum 1, maximum 20.

exclude_current_document

boolean

When true, chunks from document_id itself are excluded — only chunks from other workspace documents are returned. Defaults to false.

Response — `200 OK`

document_id

string

required

UUID of the document used to scope the workspace search (echoed from the request).

query

string

required

The search query (echoed from the request).

results

array

required

Ranked list of matching chunks.

Show ChunkResult fields

document_id

string

required

UUID of the document the chunk belongs to.

workspace_id

string

required

UUID of the workspace the chunk belongs to.

chunk_index

integer

required

Zero-based index of the chunk within its document.

chunk_text

string

required

The raw text content of the chunk.

document_title

string

Title of the source document, or null if unavailable.

is_current_document

boolean

required

true when this chunk belongs to the document_id supplied in the request.

score

number

required

Cosine similarity score between 0 and 1. Higher means more similar.

Example response

{
  "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "query": "Q3 adoption metrics",
  "results": [
    {
      "document_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "workspace_id": "ws111222-3344-5566-7788-99aabbccddee",
      "chunk_index": 2,
      "chunk_text": "Q3 performance exceeded expectations across all product lines...",
      "document_title": "Q3 Report",
      "is_current_document": true,
      "score": 0.92
    }
  ]
}

Error responses

Status	Condition
`404 Not Found`	The `document_id` does not exist.
`403 Forbidden`	The authenticated user does not have access to the document.
`422 Unprocessable Entity`	Request body failed validation.

DELETE /api/v1/ai/embeddings/

Hard-delete all stored embedding chunks for a document. After this call the document will no longer appear in similarity searches until it is re-indexed via POST /api/v1/ai/embeddings/index. This is called automatically when a document is permanently deleted.

curl -X DELETE \
  "https://api.caret.page/api/v1/ai/embeddings/a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
  -H "Authorization: Bearer <token>"

Path parameters

document_id

string

required

UUID of the document whose embeddings to delete. The authenticated user must have access to this document.

Response — `204 No Content`

Returns an empty body on success.

Error responses

Status	Condition
`404 Not Found`	The `document_id` does not exist.
`403 Forbidden`	The authenticated user does not have access to the document.

Frontend integration pattern

The Caret frontend automatically triggers embedding indexing after a successful document save. A typical integration looks like this:

async function indexDocumentEmbeddings(
  documentId: string,
  contentText: string,
  token: string,
): Promise<void> {
  if (!contentText || contentText.trim().length === 0) {
    // Nothing to index — skip silently
    return;
  }

  const response = await fetch(
    "https://api.caret.page/api/v1/ai/embeddings/index",
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        document_id: documentId,
        content: contentText,
      }),
    },
  );

  if (!response.ok) {
    // Embedding indexing is non-critical — log the failure but don't
    // surface it to the user. RAG will degrade gracefully until the
    // next successful save re-indexes the document.
    console.warn("Embedding indexing failed:", await response.json());
    return;
  }

  const { chunks_indexed } = await response.json();
  console.debug(`Indexed ${chunks_indexed} chunks for document ${documentId}`);
}

// Call after a successful document PATCH
async function onDocumentSaved(document: Document, token: string): Promise<void> {
  await saveDocumentToApi(document, token);
  // Fire-and-forget — don't await in the critical path
  indexDocumentEmbeddings(document.id, document.content_text ?? "", token).catch(
    (err) => console.warn("Background embedding indexing error:", err),
  );
}

Fire the embedding index request in the background (fire-and-forget) after saving. If the embedding call fails, the document is still saved correctly and RAG will use the previously indexed chunks until the next successful save triggers a fresh index run.

Workspace scoping in detail

Every row in document_embeddings carries both document_id and workspace_id. When the AI agent calls search_workspace_context during a conversation, the similarity search filters by workspace_id — meaning it searches all documents in the workspace, not just the one currently open. This lets the AI answer cross-document questions like “What did the Q2 retrospective say about the design team?” as long as all relevant documents have been indexed. To scope search results to only the current document, the agent can pass exclude_current_document: false (the default) to include the open document’s own chunks, or true to retrieve context exclusively from other workspace documents.

Overview

Documents & Workspaces

AI & Collaboration

Embeddings API: Index Document Context for RAG Retrieval

How RAG works in Caret

POST /api/v1/ai/embeddings/index

Request body

What happens internally

Response — `200 OK`

Error responses

POST /api/v1/ai/embeddings/search

Request body

Response — `200 OK`

Error responses

DELETE /api/v1/ai/embeddings/

Path parameters

Response — `204 No Content`

Error responses

Frontend integration pattern

Workspace scoping in detail

Build docs developers (and LLMs) love

Overview

Documents & Workspaces

AI & Collaboration

Documentation Index

​How RAG works in Caret

​POST /api/v1/ai/embeddings/index

​Request body

​What happens internally

​Response — 200 OK

​Error responses

​POST /api/v1/ai/embeddings/search

​Request body

​Response — 200 OK

​Error responses

​DELETE /api/v1/ai/embeddings/

​Path parameters

​Response — 204 No Content

​Error responses

​Frontend integration pattern

​Workspace scoping in detail

Build docs developers (and LLMs) love

How RAG works in Caret

POST /api/v1/ai/embeddings/index

Request body

What happens internally

Response — `200 OK`

Error responses

POST /api/v1/ai/embeddings/search

Request body

Response — `200 OK`

Error responses

DELETE /api/v1/ai/embeddings/

Path parameters

Response — `204 No Content`

Error responses

Frontend integration pattern

Workspace scoping in detail