Caret’s AI assistant becomes significantly more useful when it can reference the actual content of documents in your workspace. The Embeddings API powers this by splitting a document’s plain text into overlapping chunks, generating a vector embedding for each chunk using the configured LLM provider’s embedding model, and storing those vectors in aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/arrozet/caret/llms.txt
Use this file to discover all available pages before exploring further.
pgvector column (vector(1536)) in the document_embeddings table. When a user asks the AI a question, the agent queries this table with HNSW cosine-similarity search to find the most relevant chunks and injects them into its system prompt — a pattern known as Retrieval-Augmented Generation (RAG).
The endpoint is safe to call on every document save: existing embeddings for the document are atomically replaced, so there is no risk of stale or duplicate chunks accumulating over time.
All endpoints require a valid Supabase JWT and are proxied through the API Gateway at https://api.caret.page/api/v1/ai/....
How RAG works in Caret
document_id is included in the stream request body, the AI agent automatically calls the search_workspace_context internal tool, which executes a cosine-similarity search over the workspace’s document_embeddings rows and prepends the most relevant passages to the system prompt before the LLM generates its reply.
Embeddings are workspace-scoped. The similarity search is bounded to documents that belong to the same workspace as the target document, and only documents the authenticated user has access to are indexed. The AI will never surface content from a document in a different workspace or a document the user cannot read.
POST /api/v1/ai/embeddings/index
Index or re-index a document’s embeddings. Send the full plain-text content of the document; the service handles chunking, embedding, and storage automatically.Request body
UUID of the document to index. The authenticated user must have access to this document; a
403 is returned if they do not, and 404 if the document does not exist.Plain-text content of the document. Minimum 1 character, maximum 500,000 characters. Strip Tiptap/ProseMirror JSON before sending — only raw text is accepted. The document service exposes a
content_text field on every document response that is suitable for direct use here.What happens internally
- The service deletes all existing
document_embeddingsrows fordocument_id. - The content is split into overlapping fixed-size chunks (with a configurable stride to preserve context across boundaries).
- Each chunk is passed to the configured embedding model (OpenAI
text-embedding-ada-002or equivalent) to produce a 1536-dimensional float vector. - Each chunk row is inserted into
document_embeddingswith thedocument_id,workspace_id(resolved from the document record),chunk_index,chunk_text, andembedding vector(1536). - The response reports how many chunks were stored.
Response — 200 OK
UUID of the indexed document (echoed from the request).
Number of embedding chunks stored in
document_embeddings. A short document might produce a single chunk; a long one could produce dozens.Error responses
| Status | Condition |
|---|---|
404 Not Found | The document_id does not exist or the workspace could not be resolved. |
403 Forbidden | The authenticated user does not have access to the document. |
422 Unprocessable Entity | Request body failed validation (e.g. content is empty or exceeds 500,000 characters). |
POST /api/v1/ai/embeddings/search
Run a semantic similarity search over the embedding chunks stored for a workspace. The service embeds the query string and returns the top-k most similar chunks ranked by cosine similarity. This is the same search the AI agent executes internally via thesearch_workspace_context tool; it is exposed as a public endpoint for custom integrations.
Request body
The search query to embed. Minimum 1 character, maximum 2,000 characters.
UUID of any document in the target workspace. Used to resolve the workspace scope — the search covers all documents in the same workspace, not just this document. The authenticated user must have access to this document.
Maximum number of chunks to return. Defaults to
5. Minimum 1, maximum 20.When
true, chunks from document_id itself are excluded — only chunks from other workspace documents are returned. Defaults to false.Response — 200 OK
UUID of the document used to scope the workspace search (echoed from the request).
The search query (echoed from the request).
Ranked list of matching chunks.
Error responses
| Status | Condition |
|---|---|
404 Not Found | The document_id does not exist. |
403 Forbidden | The authenticated user does not have access to the document. |
422 Unprocessable Entity | Request body failed validation. |
DELETE /api/v1/ai/embeddings/
Hard-delete all stored embedding chunks for a document. After this call the document will no longer appear in similarity searches until it is re-indexed viaPOST /api/v1/ai/embeddings/index. This is called automatically when a document is permanently deleted.
Path parameters
UUID of the document whose embeddings to delete. The authenticated user must have access to this document.
Response — 204 No Content
Returns an empty body on success.
Error responses
| Status | Condition |
|---|---|
404 Not Found | The document_id does not exist. |
403 Forbidden | The authenticated user does not have access to the document. |
Frontend integration pattern
The Caret frontend automatically triggers embedding indexing after a successful document save. A typical integration looks like this:Workspace scoping in detail
Every row indocument_embeddings carries both document_id and workspace_id. When the AI agent calls search_workspace_context during a conversation, the similarity search filters by workspace_id — meaning it searches all documents in the workspace, not just the one currently open. This lets the AI answer cross-document questions like “What did the Q2 retrospective say about the design team?” as long as all relevant documents have been indexed.
To scope search results to only the current document, the agent can pass exclude_current_document: false (the default) to include the open document’s own chunks, or true to retrieve context exclusively from other workspace documents.