Database Schema: Supabase PostgreSQL, pgvector, and RLS

Caret stores all persistent application data in a single Supabase Cloud PostgreSQL instance. The database is not hosted on the Hetzner VPS — it is an external managed service provided by Supabase. Two ORM layers operate against the same physical database: Drizzle ORM for the four Node.js services and SQLAlchemy async for the Python AI service. Both use their own migration tools to evolve the schema independently.

Database Stack

Layer	Technology
Database engine	Supabase Cloud PostgreSQL
Identity provider	Supabase Auth (`auth.users`)
Vector search	pgvector (`vector` extension)
ORM — Node services	Drizzle ORM
ORM — AI service	SQLAlchemy async
Migrations — Node services	Drizzle migrations in `src/db/migrations/`
Migrations — AI service	Alembic in `src/db/migrations/versions/`
Frontend direct access	Supabase JS (auth and `user_profiles` only)

All application tables live in the public schema.

Required PostgreSQL Extensions

-- UUID generation used by all primary keys
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Vector similarity search for RAG embeddings
CREATE EXTENSION IF NOT EXISTS vector;

-- Case-insensitive text columns (slugs, emails)
CREATE EXTENSION IF NOT EXISTS citext;

-- Planned for full-text trigram search
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Query performance observability
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

pgcrypto and citext are enabled by the document-service migrations. vector (pgvector) is enabled by the AI service Alembic migration 0002_document_embeddings.py. Enable all extensions before running any migrations against a fresh Supabase project.

Migration Sources

Area	Source of Truth
Core documents, workspaces, folders, profiles	`app/backend/document-service/src/db/schema.ts` and `src/db/migrations/`
Collaboration persistence	`app/backend/collab-service/src/db/schema.ts` and `src/db/migrations/`
AI conversations, messages, suggestions, embeddings	`app/backend/ai-service/src/models/ai.py` and Alembic versions
Live Alembic migration history	`alembic_version` table (managed automatically by Alembic)

Full Table Inventory

Every table in the public schema is listed below. Do not drop any of these tables from the live Supabase project without first removing the code that references them.

Table	Status	Runtime Owner
`user_profiles`	Active	Frontend reads/upserts via Supabase JS in `authStore.ts`
`workspaces`	Active	Document service workspace repository
`workspace_members`	Active	Document service membership and RBAC checks
`folders`	Active	Document service folder tree
`documents`	Active	Document service document metadata
`document_members`	Active	Document sharing and per-document permission checks
`document_versions`	Active	Document content snapshots and version history
`document_embeddings`	Active	AI service RAG indexing and semantic vector search
`ai_conversations`	Active	AI service chat session persistence
`ai_messages`	Active	AI service chat turns and tool traces
`ai_suggestions`	Active	AI service suggestion lifecycle
`document_collab_updates`	Active, partially wired	Collab service Y.js incremental update log
`document_collab_snapshots`	Active, partially wired	Collab service Y.js periodic full-state snapshots
`alembic_version`	Active	Alembic migration bookkeeping; not an application domain table

Schema Conventions

Naming

Table names: snake_case, plural (e.g. document_versions, workspace_members).
Column names: snake_case (e.g. workspace_id, created_by_user_id).
Foreign keys: <singular_table_name>_id (e.g. document_id, folder_id).
TypeScript Drizzle exports match the table name (e.g. export const document_versions).
Python SQLAlchemy models use PascalCase (e.g. AiConversation, DocumentEmbedding).

Primary Keys and Timestamps

All single-column primary keys are UUID generated with gen_random_uuid() (requires pgcrypto).
Pure join and log tables use composite primary keys (e.g. workspace_members on (workspace_id, user_id), document_collab_updates on (document_id, seq)).
All timestamps use TIMESTAMPTZ (timezone-aware).
Mutable entities carry both created_at and updated_at.
Soft-deletable entities carry deleted_at; treat deleted_at IS NULL as active.
Partial unique indexes are used when uniqueness should apply only to active (non-deleted) records.

Soft Deletes

-- Treat active records as those with no deletion timestamp
WHERE deleted_at IS NULL

-- Example partial unique index from workspaces
CREATE UNIQUE INDEX uq_workspaces_slug_active
  ON workspaces (slug)
  WHERE deleted_at IS NULL AND slug IS NOT NULL;

Table Details

Core Document Tables (document-service)

workspaces — Tenant boundary for all content. Contains slug (citext, partial-unique), name, settings (JSONB), soft-delete via deleted_at. workspace_members — RBAC membership with composite PK (workspace_id, user_id). Roles: owner, admin, member, guest. Soft-remove via revoked_at. folders — Adjacency-list folder tree per workspace. Self-referencing FK via parent_folder_id. Soft-delete via deleted_at. Unique constraint on (workspace_id, parent_folder_id, name) for active records. documents — Document metadata. Visibility: private, workspace, link, public. Status: active, archived. Pointer to latest version via latest_version_id. Soft-delete via deleted_at. document_members — Per-document RBAC with composite PK (document_id, user_id). Roles: owner, editor, commenter, viewer. Tracks last_viewed_at. document_versions — Immutable version snapshots. Stores content_json (ProseMirror/Tiptap JSON) and content_text (plain-text extraction). Unique constraint on (document_id, version_number). user_profiles — Application user profile extending auth.users. Stores display_name, avatar_url, and locale. PK is user_id (matches auth.users.id).

AI Tables (ai-service)

ai_conversations — One chat session per (user_id, document_id) pair. Stores optional title. ai_messages — Individual chat turns. Roles: system, user, assistant, tool. Stores raw content text, optional token_count, and tool_calls JSONB array (added in Alembic migration 0004_ai_messages_tool_calls.py). ai_suggestions — AI-generated text proposals. Lifecycle status: proposed → applied / dismissed / superseded. Stores original_text, suggested_text, and optional position_start/position_end character offsets within the Tiptap document. document_embeddings — See the pgvector and RAG section below.

pgvector and RAG

Caret uses pgvector to power workspace-scoped semantic search for retrieval-augmented generation (RAG). The document_embeddings table stores fixed-size vector chunks of document text.

# From ai-service/src/models/ai.py
class DocumentEmbedding(Base):
    __tablename__ = "document_embeddings"

    id: Mapped[uuid.UUID]        # UUID PK
    document_id: Mapped[uuid.UUID]   # FK to documents.id (DB-enforced, not in ORM)
    workspace_id: Mapped[uuid.UUID]  # Denormalized for workspace-scoped search
    chunk_index: Mapped[int]         # Zero-based chunk position in the document
    chunk_text: Mapped[str]          # Raw text of this chunk
    embedding: Mapped[list[float]]   # Vector(1536) — OpenAI text-embedding-3-small
    created_at / updated_at          # Timestamps

HNSW index (approximate nearest-neighbor):

-- Cosine distance index for fast ANN search, scoped to workspace
CREATE INDEX ON document_embeddings
  USING hnsw (embedding vector_cosine_ops);

RAG flow:

Index

After a document is saved, the frontend calls the embedding API. EmbeddingService splits the document into overlapping chunks, calls the OpenAI text-embedding-3-small model, and upserts the resulting vector(1536) rows into document_embeddings, replacing any existing chunks for that document.

When a user sends a chat message, AIAgentService calls search_workspace_context which runs an HNSW cosine similarity search against document_embeddings filtered by workspace_id, returning the top-N most relevant chunks.

Inject

Retrieved chunks are injected as context into the agent’s system prompt before calling the LLM. If no embeddings exist for the workspace, the agent degrades gracefully and responds without retrieved context.

Y.js Collaboration Persistence

The collab-service writes real-time Y.js state to two tables: document_collab_updates — Append-only log of encoded Y.js update binaries.

// Composite PK: (document_id, seq)
// seq is a monotonic integer per document; enforced by application code
// update column is PostgreSQL bytea (binary Y.js encoded update)
export const document_collab_updates = pgTable("document_collab_updates", {
  document_id: uuid("document_id").notNull(),
  seq:         bigint("seq", { mode: "number" }).notNull(),
  update:      bytea("update").notNull(),
  client_id:   bigint("client_id", { mode: "number" }),  // optional debug info
  user_id:     uuid("user_id"),                          // SET NULL on user delete
  created_at:  timestamp("created_at", { withTimezone: true }).notNull().defaultNow(),
});

document_collab_snapshots — Periodic full-state checkpoints to bound update log growth.

// Unique on (document_id, snapshot_seq)
// ydoc: full Y.Doc binary state; state_vector: Y.js state vector for sync
export const document_collab_snapshots = pgTable("document_collab_snapshots", {
  id:                  uuid("id").primaryKey().defaultRandom().notNull(),
  document_id:         uuid("document_id").notNull(),
  snapshot_seq:        bigint("snapshot_seq", { mode: "number" }).notNull(),
  ydoc:                bytea("ydoc").notNull(),
  state_vector:        bytea("state_vector").notNull(),
  created_by_user_id:  uuid("created_by_user_id"),
  created_at:          timestamp("created_at", { withTimezone: true }).notNull().defaultNow(),
});

Write path (active when DATABASE_URL is set on the collab-service):

ConnectionHandler persists incoming Y.js updates to document_collab_updates.
SnapshotScheduler periodically calls CollabPersistenceService to snapshot active rooms into document_collab_snapshots.

Y.js persistence is partially wired. Updates are written and snapshots are scheduled, but RoomManager still initializes new rooms from a fresh Y.Doc on startup. CollabPersistenceService.loadDocument — which would reconstruct the Y.Doc from the latest snapshot and subsequent incremental updates — is not yet called during room creation. This means document state is not restored after a server restart or when a room is first created following a crash.

Row-Level Security

RLS is enforced on all application tables at the PostgreSQL level. Policies are defined in SQL migration files and are evaluated for every query, regardless of which service issues it.

Table group	RLS policy source
Core document tables	`document-service/src/db/migrations/001_rls_core_tables.sql`
Collaboration tables	`collab-service/src/db/migrations/001_rls_collab_tables.sql`
AI tables	`ai-service/src/db/migrations/versions/0003_enable_rls_on_public_tables.py`

Access patterns:

Backend services use service-role credentials (bypasses RLS) or pass a server-validated JWT for user-scoped queries.
Frontend uses the Supabase JS anon client only for user_profiles (self-access RLS policies allow this).
All other tables are inaccessible to the frontend anon client; access goes through the REST API.

Test any destructive RLS changes against a dedicated Supabase test project — never against production.

Get Started

Core Features

Architecture

Development

Deployment

Database Schema: Supabase PostgreSQL, pgvector, and RLS

Database Stack

Required PostgreSQL Extensions

Migration Sources

Full Table Inventory

Schema Conventions

Naming

Primary Keys and Timestamps

Soft Deletes

Table Details

Core Document Tables (document-service)

AI Tables (ai-service)

pgvector and RAG

Y.js Collaboration Persistence

Row-Level Security

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Development

Deployment

Documentation Index

​Database Stack

​Required PostgreSQL Extensions

​Migration Sources

​Full Table Inventory

​Schema Conventions

​Naming

​Primary Keys and Timestamps

​Soft Deletes

​Table Details

​Core Document Tables (document-service)

​AI Tables (ai-service)

​pgvector and RAG

​Y.js Collaboration Persistence

​Row-Level Security

Build docs developers (and LLMs) love

Database Stack

Required PostgreSQL Extensions

Migration Sources

Full Table Inventory

Schema Conventions

Naming

Primary Keys and Timestamps

Soft Deletes

Table Details

Core Document Tables (document-service)

AI Tables (ai-service)

pgvector and RAG

Y.js Collaboration Persistence

Row-Level Security