Overview
Stanzo’s data model consists of 4 core tables plus authentication tables provided by Convex Auth. The schema is defined in convex/schema.ts using Convex’s type-safe schema builder.
import { authTables } from "@convex-dev/auth/server"
import { defineSchema, defineTable } from "convex/server"
import { v } from "convex/values"
export default defineSchema({
...authTables, // Adds users, authSessions, authAccounts, etc.
debates: defineTable({ /* ... */ }),
transcriptChunks: defineTable({ /* ... */ }),
claims: defineTable({ /* ... */ }),
extractionSessions: defineTable({ /* ... */ }),
})
authTables from @convex-dev/auth automatically adds tables for user accounts, sessions, and OAuth providers. See Authentication for details.
Table: debates
Stores metadata for each debate session.
Schema
debates: defineTable({
userId: v.id("users"),
speakerAName: v.string(),
speakerBName: v.string(),
status: v.union(v.literal("active"), v.literal("ended")),
startedAt: v.number(),
endedAt: v.optional(v.number()),
})
.index("by_user", ["userId"])
.index("by_user_and_status", ["userId", "status"])
Fields
| Field | Type | Description |
|---|
userId | Id<"users"> | Owner of the debate (foreign key) |
speakerAName | string | Name of first speaker |
speakerBName | string | Name of second speaker |
status | "active" | "ended" | Current debate state |
startedAt | number | Unix timestamp (ms) when debate started |
endedAt | number? | Unix timestamp (ms) when debate ended |
Indexes
by_user: Enables fast lookup of all debates for a user
by_user_and_status: Optimized for queries like “get active debates for user”
Relationships
- One-to-many with
transcriptChunks: A debate has many transcript chunks
- One-to-many with
claims: A debate has many extracted claims
- One-to-one with
extractionSessions: A debate has one conversation history
Status Transitions
Once a debate is ended, it cannot be reactivated.
Table: transcriptChunks
Stores transcript segments from the debate audio.
Schema
transcriptChunks: defineTable({
debateId: v.id("debates"),
speaker: v.union(v.literal(0), v.literal(1)),
text: v.string(),
startTime: v.number(),
endTime: v.number(),
processedForClaims: v.boolean(),
})
.index("by_debate", ["debateId"])
.index("by_debate_and_time", ["debateId", "startTime"])
.index("by_debate_unprocessed", ["debateId", "processedForClaims"])
Fields
| Field | Type | Description |
|---|
debateId | Id<"debates"> | Parent debate (foreign key) |
speaker | 0 | 1 | Which speaker: 0 = speaker A, 1 = speaker B |
text | string | Transcribed text content |
startTime | number | Start timestamp (ms) within debate |
endTime | number | End timestamp (ms) within debate |
processedForClaims | boolean | Whether this chunk has been sent to claim extraction |
Indexes
by_debate: Get all chunks for a debate
by_debate_and_time: Get chunks in chronological order
by_debate_unprocessed: Efficiently find chunks that need claim extraction
The by_debate_unprocessed compound index on [debateId, processedForClaims] allows fast queries for unprocessed chunks without scanning the entire table.
Utterance Boundaries
Each chunk represents a continuous utterance from one speaker. When the speaker changes or there’s a significant pause, a new chunk is created. This helps the AI understand:
- Who said what
- Natural conversation flow
- Context for pronoun resolution
Processing Workflow
- Chunk inserted with
processedForClaims: false
- Claim extraction queries for unprocessed chunks
- Chunks marked
processedForClaims: true before LLM call to prevent duplicates
- Chunks remain in database for full transcript history
Table: claims
Stores extracted factual claims and their verification results.
Schema
claims: defineTable({
debateId: v.id("debates"),
speaker: v.union(v.literal(0), v.literal(1)),
claimText: v.string(),
originalTranscriptExcerpt: v.string(),
status: v.union(
v.literal("pending"),
v.literal("checking"),
v.literal("true"),
v.literal("false"),
v.literal("mixed"),
v.literal("unverifiable"),
),
verdict: v.optional(v.string()),
correction: v.optional(v.string()),
sources: v.optional(v.array(v.string())),
extractedAt: v.number(),
checkedAt: v.optional(v.number()),
})
.index("by_debate", ["debateId"])
.index("by_debate_and_status", ["debateId", "status"])
.index("by_status", ["status"])
Fields
| Field | Type | Description |
|---|
debateId | Id<"debates"> | Parent debate (foreign key) |
speaker | 0 | 1 | Which speaker made the claim |
claimText | string | Extracted factual claim |
originalTranscriptExcerpt | string | Quote from transcript where claim was found |
status | Status enum | Current verification state (see below) |
verdict | string? | Explanation of fact-check result |
correction | string? | Correct information (if claim was false/mixed) |
sources | string[]? | URLs of sources used for verification |
extractedAt | number | Unix timestamp (ms) when claim was extracted |
checkedAt | number? | Unix timestamp (ms) when fact-check completed |
Status Enum
Claims progress through these states:
| Status | Description |
|---|
pending | Claim extracted, waiting for fact-check to start |
checking | Fact-check in progress (API call to Perplexity) |
true | Claim verified as accurate |
false | Claim verified as false (correction provided) |
mixed | Claim partially true (correction explains nuance) |
unverifiable | Insufficient evidence to verify |
Status Transitions
If fact-checking fails after all retries, the claim is marked unverifiable with a fallback verdict. This prevents claims from getting stuck in the checking state.
Indexes
by_debate: Get all claims for a debate
by_debate_and_status: Filter claims by status (e.g., show only false claims)
by_status: Global queries across all debates (e.g., admin dashboard)
Stores conversation history for claim extraction AI.
Schema
extractionSessions: defineTable({
debateId: v.id("debates"),
messages: v.array(
v.object({
role: v.union(v.literal("user"), v.literal("model")),
content: v.string(),
}),
),
}).index("by_debate", ["debateId"])
Fields
| Field | Type | Description |
|---|
debateId | Id<"debates"> | Parent debate (foreign key) |
messages | Message[] | Conversation history with AI |
Each message has:
role: "user" (transcript chunks) or "model" (AI responses)
content: The actual text content
Purpose
This table allows the claim extraction AI to maintain context across multiple extraction runs:
// Load existing history
const session = await ctx.runQuery(
internal.extractionSessions.getByDebate,
{ debateId: args.debateId },
)
const existingMessages = session?.messages ?? []
// Add new transcript chunks as user message
const messages = [
...existingMessages,
{ role: "user", content: newTranscriptChunks },
]
// After AI responds, save updated history
await ctx.runMutation(internal.extractionSessions.upsert, {
debateId: args.debateId,
messages: [...messages, { role: "model", content: aiResponse }],
})
This enables:
- Pronoun resolution: “He said X” → AI knows who “he” refers to
- No re-extraction: AI remembers previous claims and doesn’t duplicate
- Context awareness: AI understands debate flow and references
There is a 1:1 relationship between debates and extraction sessions. Each debate has exactly one conversation history that grows over time.
Relationships Diagram
Query Patterns
Get all data for a debate
const debate = await ctx.db.get(debateId)
const chunks = await ctx.db
.query("transcriptChunks")
.withIndex("by_debate_and_time", (q) => q.eq("debateId", debateId))
.collect()
const claims = await ctx.db
.query("claims")
.withIndex("by_debate", (q) => q.eq("debateId", debateId))
.collect()
Get pending claims
const pendingClaims = await ctx.db
.query("claims")
.withIndex("by_debate_and_status", (q) =>
q.eq("debateId", debateId).eq("status", "pending")
)
.collect()
Get unprocessed chunks
const unprocessed = await ctx.db
.query("transcriptChunks")
.withIndex("by_debate_unprocessed", (q) =>
q.eq("debateId", debateId).eq("processedForClaims", false)
)
.collect()
Always use indexes for queries. Convex optimizes indexed queries to O(log n + k) where k is the result size, compared to O(n) for full table scans.