Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/arrozet/caret/llms.txt

Use this file to discover all available pages before exploring further.

Caret stores all persistent application data in a single Supabase Cloud PostgreSQL instance. The database is not hosted on the Hetzner VPS — it is an external managed service provided by Supabase. Two ORM layers operate against the same physical database: Drizzle ORM for the four Node.js services and SQLAlchemy async for the Python AI service. Both use their own migration tools to evolve the schema independently.

Database Stack

LayerTechnology
Database engineSupabase Cloud PostgreSQL
Identity providerSupabase Auth (auth.users)
Vector searchpgvector (vector extension)
ORM — Node servicesDrizzle ORM
ORM — AI serviceSQLAlchemy async
Migrations — Node servicesDrizzle migrations in src/db/migrations/
Migrations — AI serviceAlembic in src/db/migrations/versions/
Frontend direct accessSupabase JS (auth and user_profiles only)
All application tables live in the public schema.

Required PostgreSQL Extensions

-- UUID generation used by all primary keys
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Vector similarity search for RAG embeddings
CREATE EXTENSION IF NOT EXISTS vector;

-- Case-insensitive text columns (slugs, emails)
CREATE EXTENSION IF NOT EXISTS citext;

-- Planned for full-text trigram search
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Query performance observability
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
pgcrypto and citext are enabled by the document-service migrations. vector (pgvector) is enabled by the AI service Alembic migration 0002_document_embeddings.py. Enable all extensions before running any migrations against a fresh Supabase project.

Migration Sources

AreaSource of Truth
Core documents, workspaces, folders, profilesapp/backend/document-service/src/db/schema.ts and src/db/migrations/
Collaboration persistenceapp/backend/collab-service/src/db/schema.ts and src/db/migrations/
AI conversations, messages, suggestions, embeddingsapp/backend/ai-service/src/models/ai.py and Alembic versions
Live Alembic migration historyalembic_version table (managed automatically by Alembic)

Full Table Inventory

Every table in the public schema is listed below. Do not drop any of these tables from the live Supabase project without first removing the code that references them.
TableStatusRuntime Owner
user_profilesActiveFrontend reads/upserts via Supabase JS in authStore.ts
workspacesActiveDocument service workspace repository
workspace_membersActiveDocument service membership and RBAC checks
foldersActiveDocument service folder tree
documentsActiveDocument service document metadata
document_membersActiveDocument sharing and per-document permission checks
document_versionsActiveDocument content snapshots and version history
document_embeddingsActiveAI service RAG indexing and semantic vector search
ai_conversationsActiveAI service chat session persistence
ai_messagesActiveAI service chat turns and tool traces
ai_suggestionsActiveAI service suggestion lifecycle
document_collab_updatesActive, partially wiredCollab service Y.js incremental update log
document_collab_snapshotsActive, partially wiredCollab service Y.js periodic full-state snapshots
alembic_versionActiveAlembic migration bookkeeping; not an application domain table

Schema Conventions

Naming

  • Table names: snake_case, plural (e.g. document_versions, workspace_members).
  • Column names: snake_case (e.g. workspace_id, created_by_user_id).
  • Foreign keys: <singular_table_name>_id (e.g. document_id, folder_id).
  • TypeScript Drizzle exports match the table name (e.g. export const document_versions).
  • Python SQLAlchemy models use PascalCase (e.g. AiConversation, DocumentEmbedding).

Primary Keys and Timestamps

  • All single-column primary keys are UUID generated with gen_random_uuid() (requires pgcrypto).
  • Pure join and log tables use composite primary keys (e.g. workspace_members on (workspace_id, user_id), document_collab_updates on (document_id, seq)).
  • All timestamps use TIMESTAMPTZ (timezone-aware).
  • Mutable entities carry both created_at and updated_at.
  • Soft-deletable entities carry deleted_at; treat deleted_at IS NULL as active.
  • Partial unique indexes are used when uniqueness should apply only to active (non-deleted) records.

Soft Deletes

-- Treat active records as those with no deletion timestamp
WHERE deleted_at IS NULL

-- Example partial unique index from workspaces
CREATE UNIQUE INDEX uq_workspaces_slug_active
  ON workspaces (slug)
  WHERE deleted_at IS NULL AND slug IS NOT NULL;

Table Details

Core Document Tables (document-service)

workspaces — Tenant boundary for all content. Contains slug (citext, partial-unique), name, settings (JSONB), soft-delete via deleted_at. workspace_members — RBAC membership with composite PK (workspace_id, user_id). Roles: owner, admin, member, guest. Soft-remove via revoked_at. folders — Adjacency-list folder tree per workspace. Self-referencing FK via parent_folder_id. Soft-delete via deleted_at. Unique constraint on (workspace_id, parent_folder_id, name) for active records. documents — Document metadata. Visibility: private, workspace, link, public. Status: active, archived. Pointer to latest version via latest_version_id. Soft-delete via deleted_at. document_members — Per-document RBAC with composite PK (document_id, user_id). Roles: owner, editor, commenter, viewer. Tracks last_viewed_at. document_versions — Immutable version snapshots. Stores content_json (ProseMirror/Tiptap JSON) and content_text (plain-text extraction). Unique constraint on (document_id, version_number). user_profiles — Application user profile extending auth.users. Stores display_name, avatar_url, and locale. PK is user_id (matches auth.users.id).

AI Tables (ai-service)

ai_conversations — One chat session per (user_id, document_id) pair. Stores optional title. ai_messages — Individual chat turns. Roles: system, user, assistant, tool. Stores raw content text, optional token_count, and tool_calls JSONB array (added in Alembic migration 0004_ai_messages_tool_calls.py). ai_suggestions — AI-generated text proposals. Lifecycle status: proposedapplied / dismissed / superseded. Stores original_text, suggested_text, and optional position_start/position_end character offsets within the Tiptap document. document_embeddings — See the pgvector and RAG section below.

pgvector and RAG

Caret uses pgvector to power workspace-scoped semantic search for retrieval-augmented generation (RAG). The document_embeddings table stores fixed-size vector chunks of document text.
# From ai-service/src/models/ai.py
class DocumentEmbedding(Base):
    __tablename__ = "document_embeddings"

    id: Mapped[uuid.UUID]        # UUID PK
    document_id: Mapped[uuid.UUID]   # FK to documents.id (DB-enforced, not in ORM)
    workspace_id: Mapped[uuid.UUID]  # Denormalized for workspace-scoped search
    chunk_index: Mapped[int]         # Zero-based chunk position in the document
    chunk_text: Mapped[str]          # Raw text of this chunk
    embedding: Mapped[list[float]]   # Vector(1536) — OpenAI text-embedding-3-small
    created_at / updated_at          # Timestamps
HNSW index (approximate nearest-neighbor):
-- Cosine distance index for fast ANN search, scoped to workspace
CREATE INDEX ON document_embeddings
  USING hnsw (embedding vector_cosine_ops);
RAG flow:
1

Index

After a document is saved, the frontend calls the embedding API. EmbeddingService splits the document into overlapping chunks, calls the OpenAI text-embedding-3-small model, and upserts the resulting vector(1536) rows into document_embeddings, replacing any existing chunks for that document.
2

Search

When a user sends a chat message, AIAgentService calls search_workspace_context which runs an HNSW cosine similarity search against document_embeddings filtered by workspace_id, returning the top-N most relevant chunks.
3

Inject

Retrieved chunks are injected as context into the agent’s system prompt before calling the LLM. If no embeddings exist for the workspace, the agent degrades gracefully and responds without retrieved context.

Y.js Collaboration Persistence

The collab-service writes real-time Y.js state to two tables: document_collab_updates — Append-only log of encoded Y.js update binaries.
// Composite PK: (document_id, seq)
// seq is a monotonic integer per document; enforced by application code
// update column is PostgreSQL bytea (binary Y.js encoded update)
export const document_collab_updates = pgTable("document_collab_updates", {
  document_id: uuid("document_id").notNull(),
  seq:         bigint("seq", { mode: "number" }).notNull(),
  update:      bytea("update").notNull(),
  client_id:   bigint("client_id", { mode: "number" }),  // optional debug info
  user_id:     uuid("user_id"),                          // SET NULL on user delete
  created_at:  timestamp("created_at", { withTimezone: true }).notNull().defaultNow(),
});
document_collab_snapshots — Periodic full-state checkpoints to bound update log growth.
// Unique on (document_id, snapshot_seq)
// ydoc: full Y.Doc binary state; state_vector: Y.js state vector for sync
export const document_collab_snapshots = pgTable("document_collab_snapshots", {
  id:                  uuid("id").primaryKey().defaultRandom().notNull(),
  document_id:         uuid("document_id").notNull(),
  snapshot_seq:        bigint("snapshot_seq", { mode: "number" }).notNull(),
  ydoc:                bytea("ydoc").notNull(),
  state_vector:        bytea("state_vector").notNull(),
  created_by_user_id:  uuid("created_by_user_id"),
  created_at:          timestamp("created_at", { withTimezone: true }).notNull().defaultNow(),
});
Write path (active when DATABASE_URL is set on the collab-service):
  1. ConnectionHandler persists incoming Y.js updates to document_collab_updates.
  2. SnapshotScheduler periodically calls CollabPersistenceService to snapshot active rooms into document_collab_snapshots.
Y.js persistence is partially wired. Updates are written and snapshots are scheduled, but RoomManager still initializes new rooms from a fresh Y.Doc on startup. CollabPersistenceService.loadDocument — which would reconstruct the Y.Doc from the latest snapshot and subsequent incremental updates — is not yet called during room creation. This means document state is not restored after a server restart or when a room is first created following a crash.

Row-Level Security

RLS is enforced on all application tables at the PostgreSQL level. Policies are defined in SQL migration files and are evaluated for every query, regardless of which service issues it.
Table groupRLS policy source
Core document tablesdocument-service/src/db/migrations/001_rls_core_tables.sql
Collaboration tablescollab-service/src/db/migrations/001_rls_collab_tables.sql
AI tablesai-service/src/db/migrations/versions/0003_enable_rls_on_public_tables.py
Access patterns:
  • Backend services use service-role credentials (bypasses RLS) or pass a server-validated JWT for user-scoped queries.
  • Frontend uses the Supabase JS anon client only for user_profiles (self-access RLS policies allow this).
  • All other tables are inaccessible to the frontend anon client; access goes through the REST API.
Test any destructive RLS changes against a dedicated Supabase test project — never against production.

Build docs developers (and LLMs) love