Skip to main content

Architecture

Stanzo’s architecture is built around three core services that process debate audio in real-time:
  1. Deepgram for live transcription with speaker diarization
  2. Gemini for context-aware claim extraction
  3. Perplexity for fact-checking with primary sources
All data flows through Convex, which handles database storage, reactive subscriptions, and async job scheduling.

System overview

┌─────────────┐
│   Browser   │
│  Microphone │
└──────┬──────┘
       │ MediaStream (WebRTC)

┌─────────────────┐
│    Deepgram     │  Live WebSocket
│ nova-3 + diarize│  (interim + final transcripts)
└────────┬────────┘


┌──────────────────────────────────────────┐
│              Convex Backend              │
│                                          │
│  ┌─────────────────────────────────┐    │
│  │   transcriptChunks table        │    │
│  │   (speaker, text, timestamps)   │    │
│  └──────────┬──────────────────────┘    │
│             │                            │
│             ▼                            │
│  ┌─────────────────────────────────┐    │
│  │  Gemini 2.5 Flash Extraction    │    │
│  │  (multi-turn conversation)      │    │
│  └──────────┬──────────────────────┘    │
│             │                            │
│             ▼                            │
│  ┌─────────────────────────────────┐    │
│  │      claims table (pending)     │    │
│  └──────────┬──────────────────────┘    │
│             │                            │
│             ▼                            │
│  ┌─────────────────────────────────┐    │
│  │   Perplexity Sonar Fact-Check   │    │
│  │   (async scheduled action)      │    │
│  └──────────┬──────────────────────┘    │
│             │                            │
│             ▼                            │
│  ┌─────────────────────────────────┐    │
│  │   claims table (verdict + src)  │    │
│  └──────────┬──────────────────────┘    │
│             │                            │
└─────────────┼──────────────────────────┘
              │ Reactive subscription

         ┌─────────┐
         │ React UI│
         │ (Next.js│
         └─────────┘

Data flow

1. Live transcription

When a debate starts, the browser opens a WebSocket connection to Deepgram’s live API. Client-side streaming (src/hooks/useDeepgram.ts:33-80):
const start = async (debateId) => {
  // Get temporary Deepgram token from Convex
  const { token } = await mintToken()

  // Request microphone access
  const stream = await navigator.mediaDevices.getUserMedia({
    audio: {
      echoCancellation: true,
      noiseSuppression: true,
    },
  })

  // Create Deepgram connection
  const client = createClient({ accessToken: token })
  const connection = client.listen.live({
    model: "nova-3",
    language: "en",
    smart_format: true,
    punctuate: true,
    diarize: true,              // Speaker separation
    interim_results: true,      // Show live partial results
    utterance_end_ms: 1500,     // Trigger after 1.5s silence
  })

  // Pipe audio chunks to Deepgram
  const recorder = new MediaRecorder(stream, {
    mimeType: "audio/webm;codecs=opus",
  })
  recorder.ondataavailable = (event) => {
    if (event.data.size > 0 && connection.getReadyState() === 1) {
      connection.send(event.data)
    }
  }
  recorder.start(250)  // Send chunks every 250ms
}
Transcript processing (src/hooks/useDeepgram.ts:82-112): Deepgram sends two types of events:
  • Interim results: Partial transcriptions shown as gray text while someone is speaking
  • Final results: Confirmed transcripts saved to the database
connection.on(LiveTranscriptionEvents.Transcript, async (data) => {
  const alt = data.channel.alternatives[0]
  if (!alt?.transcript?.trim()) return

  const { transcript } = alt
  const speaker = alt.words[0]?.speaker ?? 0  // 0 or 1 from diarization
  const startTime = data.start
  const duration = data.duration

  // Show interim results in UI
  if (!data.is_final) {
    setInterim({ text: transcript, speaker })
    return
  }

  // Save final transcript to Convex
  await insertChunk({
    debateId,
    speaker: speaker === 0 ? 0 : 1,
    text: transcript,
    startTime,
    endTime: startTime + duration,
  })

  setInterim(null)
})

// Trigger claim extraction after 1.5s silence
connection.on(LiveTranscriptionEvents.UtteranceEnd, () => {
  triggerExtraction({ debateId })
})
Database schema (convex/schema.ts:19-29):
transcriptChunks: defineTable({
  debateId: v.id("debates"),
  speaker: v.union(v.literal(0), v.literal(1)),
  text: v.string(),
  startTime: v.number(),
  endTime: v.number(),
  processedForClaims: v.boolean(),  // Prevents duplicate extraction
})
  .index("by_debate", ["debateId"])
  .index("by_debate_unprocessed", ["debateId", "processedForClaims"])

2. Multi-turn claim extraction

Every utterance boundary (1.5 seconds of silence) triggers a Gemini extraction session. Why multi-turn sessions matter (convex/claimExtraction.ts:78-96): Stanzo maintains the full conversation history with Gemini for each debate. This allows the model to:
  • Avoid duplicates: Remember claims already extracted
  • Resolve context: Understand “that number” refers to a statistic from 2 minutes ago
  • Track continuity: Know when a speaker is elaborating vs. making a new claim
function buildSystemPrompt(speakerA: string, speakerB: string): string {
  return `You are a factual claim extractor for a live debate between ${speakerA} (speaker 0) and ${speakerB} (speaker 1).

Each turn, I provide a new transcript segment. You have the full conversation history.

Rules:
- ONLY extract claims from the NEW segment in my latest message
- Do NOT re-extract claims from previous turns
- Extract specific, verifiable factual claims (statistics, dates, named facts, causal claims)
- Extract the factual core when mixed with opinion
- Ignore purely opinion/prediction/subjective statements
- Use context to resolve pronouns and references

Output: JSONL, one object per line:
- speaker: 0 for ${speakerA}, 1 for ${speakerB}
- claimText: concise factual claim
- originalTranscriptExcerpt: quote from the new segment

If no factual claims, output: NO_CLAIMS
No markdown, no explanation, no array brackets.`
}
Extraction flow (convex/claimExtraction.ts:99-168):
export const extract = internalAction({
  handler: async (ctx, { debateId }) => {
    // Get unprocessed transcript chunks
    const chunks = await ctx.runQuery(
      internal.transcriptChunks.getUnprocessed,
      { debateId }
    )
    if (chunks.length === 0) return null

    // Mark as processed BEFORE calling LLM to prevent race conditions
    await ctx.runMutation(internal.transcriptChunks.markProcessed, {
      chunkIds: chunks.map((c) => c._id),
    })

    // Load existing conversation history from extractionSessions table
    const session = await ctx.runQuery(
      internal.extractionSessions.getByDebate,
      { debateId }
    )
    const existingMessages = session?.messages ?? []

    // Build new user message from chunks
    const newUserMessage = chunks
      .map((c) => `[${speakerNames[c.speaker]}]: ${c.text}`)
      .join("\n")

    const messages = [
      ...existingMessages,
      { role: "user", content: newUserMessage },
    ]

    // Stream claims from Gemini
    await streamClaims(apiKey, systemPrompt, messages, async (claim) => {
      // Save each claim as it's parsed
      await ctx.runMutation(internal.claims.saveClaim, {
        debateId,
        speaker: claim.speaker,
        claimText: claim.claimText,
        originalTranscriptExcerpt: claim.originalTranscriptExcerpt,
      })
    })

    // Persist updated conversation history
    await ctx.runMutation(internal.extractionSessions.upsert, {
      debateId,
      messages: [...messages, { role: "model", content: result }],
    })
  },
})
JSONL streaming (convex/claimExtraction.ts:30-75): Claims are parsed line-by-line from Gemini’s response, so they appear in the UI incrementally:
for await (const chunk of stream) {
  buffer += chunk.text

  // Process complete lines
  while ((newlineIdx = buffer.indexOf("\n")) !== -1) {
    const line = buffer.slice(0, newlineIdx).trim()
    buffer = buffer.slice(newlineIdx + 1)
    
    const claim = parseClaim(line)  // Parse JSON from line
    if (claim) await onClaim(claim)  // Save to database immediately
  }
}
Database schema (convex/schema.ts:31-52):
claims: defineTable({
  debateId: v.id("debates"),
  speaker: v.union(v.literal(0), v.literal(1)),
  claimText: v.string(),
  originalTranscriptExcerpt: v.string(),
  status: v.union(
    v.literal("pending"),
    v.literal("checking"),
    v.literal("true"),
    v.literal("false"),
    v.literal("mixed"),
    v.literal("unverifiable"),
  ),
  verdict: v.optional(v.string()),
  correction: v.optional(v.string()),
  sources: v.optional(v.array(v.string())),
  extractedAt: v.number(),
  checkedAt: v.optional(v.number()),
})
  .index("by_debate", ["debateId"])
  .index("by_status", ["status"])
Extraction sessions (convex/schema.ts:54-62):
extractionSessions: defineTable({
  debateId: v.id("debates"),
  messages: v.array(
    v.object({
      role: v.union(v.literal("user"), v.literal("model")),
      content: v.string(),
    }),
  ),
}).index("by_debate", ["debateId"])

3. Asynchronous fact-checking

Every time a claim is saved with pending status, Convex triggers a scheduled action to fact-check it with Perplexity. Fact-check flow (convex/factCheck.ts:93-131):
export const check = internalAction({
  handler: async (ctx, { claimId }) => {
    // Update to "checking" status
    await ctx.runMutation(internal.claims.updateStatus, {
      claimId,
      status: "checking",
    })

    // Fetch claim details
    const claim = await ctx.runQuery(internal.claims.getById, { claimId })
    if (!claim) return null

    // Call Perplexity Sonar
    const factCheck = await callPerplexity(apiKey, claim.claimText)

    // Update claim with results
    await ctx.runMutation(internal.claims.updateStatus, {
      claimId,
      status: factCheck.status,      // true/false/mixed/unverifiable
      verdict: factCheck.verdict,    // Explanation text
      correction: factCheck.correction,  // Corrected info if needed
      sources: factCheck.citations,  // Array of URLs
    })
  },
})
Perplexity integration (convex/factCheck.ts:39-91): The fact-checker uses the Effect library for retry logic and timeouts:
const callPerplexity = (apiKey: string, claimText: string) =>
  Effect.gen(function* () {
    const client = new Perplexity({ apiKey })

    const response = yield* Effect.tryPromise({
      try: () =>
        client.chat.completions.create({
          model: "sonar",
          messages: [
            {
              role: "system",
              content:
                "You are a fact-checker. Evaluate the following claim and respond with ONLY a JSON object containing: status (one of: true, false, mixed, unverifiable), verdict (brief explanation), correction (if false or mixed, the correct information; otherwise null). Keep verdict and correction to ~30 words each.",
            },
            {
              role: "user",
              content: `Fact-check this claim: "${claimText}"`,
            },
          ],
        }),
      catch: (e) => new PerplexityApiError({ message: String(e) }),
    })

    // Parse response and extract citations
    const content = response.choices?.[0]?.message?.content
    const citations = (response.citations ?? []).map(String)

    return { status, verdict, correction, citations }
  }).pipe(
    Effect.retry({
      schedule: Schedule.exponential(Duration.seconds(1)).pipe(
        Schedule.intersect(Schedule.recurs(3)),  // Max 3 retries
      ),
      while: (e) => e instanceof PerplexityApiError,
    }),
    Effect.timeout(Duration.seconds(30)),  // 30s timeout
  )

4. Reactive UI updates

Convex powers the UI with reactive subscriptions. When a claim’s status changes in the database, the React component re-renders automatically. Query subscriptions (src/app/debates/new/page.tsx:20-23):
const debateArgs = debateId ? { debateId } : ("skip" as const)
const debate = useQuery(api.debates.get, debateArgs)
const chunks = useQuery(api.transcriptChunks.listByDebate, debateArgs)
const claims = useQuery(api.claims.listByDebate, debateArgs)
When claims updates (e.g., a claim goes from pendingtrue), React automatically re-renders the ClaimsSidebar component with the new data. No polling required.

Tech stack details

Frontend: Next.js 16 + React 19

  • App Router: File-based routing with layouts
  • Server Components: Pre-render static pages like the landing page
  • Client Components: Interactive debate UI with real-time updates
  • Tailwind CSS 4: Utility-first styling with custom design system
Key dependencies (package.json:13-26):
{
  "dependencies": {
    "@deepgram/sdk": "4.11.3",
    "@convex-dev/auth": "0.0.91",
    "convex": "1.32.0",
    "next": "16.1.6",
    "react": "19.2.4",
    "@phosphor-icons/react": "2.1.10"
  }
}

Backend: Convex

Convex provides:
  • Database: Stores debates, transcripts, claims, and extraction sessions
  • Mutations: Create debates, insert chunks, save claims
  • Queries: Fetch debates, list claims, get transcripts
  • Actions: Call external APIs (Gemini, Perplexity)
  • Scheduling: Trigger fact-checks after claim insertion
  • Subscriptions: Push live updates to React
Authentication (convex/auth.ts):
import GitHub from "@auth/core/providers/github"
import { convexAuth } from "@convex-dev/auth/server"

export const { auth, signIn, signOut, isAuthenticated } = convexAuth({
  providers: [GitHub],
})

AI services

ServiceModelPurposeKey Features
Deepgramnova-3Live transcriptionSpeaker diarization, interim results, 1.5s utterance detection
Gemini2.0 FlashClaim extractionMulti-turn conversations, streaming JSONL output, 4096 token limit
PerplexitySonarFact-checkingWeb search, citation extraction, structured JSON responses

Error handling: Effect library

Stanzo uses the Effect library for functional error handling instead of try/catch chains. Benefits:
  • Retries: Exponential backoff for transient API failures
  • Timeouts: Prevent hanging requests
  • Type-safe errors: Structured error types like PerplexityApiError
  • Composable: Chain operations with .pipe()
Example (convex/factCheck.ts:84-91):
Effect.retry({
  schedule: Schedule.exponential(Duration.seconds(1)).pipe(
    Schedule.intersect(Schedule.recurs(3)),  // 1s, 2s, 4s, then fail
  ),
  while: (e) => e instanceof PerplexityApiError,
}),
Effect.timeout(Duration.seconds(30)),

Key design decisions

Why multi-turn extraction sessions?

Without conversation history, Gemini would:
  • Re-extract the same claim multiple times
  • Struggle with pronouns like “he said that number is wrong”
  • Miss when speakers circle back to earlier topics
By maintaining full context, Stanzo extracts claims accurately even in fast-paced debates with cross-talk and references.

Why JSONL streaming instead of batch extraction?

Streaming claims line-by-line means:
  • Users see results faster (claims appear as they’re parsed)
  • Gemini doesn’t have to finish the entire response before saving
  • Lower perceived latency in the UI
A batch approach would wait for all claims, then insert them at once—creating a longer delay.

Why async fact-checking?

Decoupling extraction from fact-checking prevents slow Perplexity calls from blocking Gemini. If a claim takes 10 seconds to verify, it shouldn’t delay extraction of the next utterance. Convex’s scheduler runs fact-checks in parallel, so multiple claims are verified simultaneously.

Why Convex instead of traditional backend?

Convex removes the need to:
  • Set up WebSocket infrastructure for real-time updates
  • Build a job queue for async actions
  • Write SQL migrations for schema changes
  • Deploy separate API servers
Everything runs on Convex’s serverless platform, and the React client subscribes to database changes automatically.

Performance characteristics

Transcription latency: ~500ms from speech to transcript appearing in UI (Deepgram’s nova-3 model) Claim extraction latency: 1-3 seconds after utterance boundary (depends on Gemini response time and conversation length) Fact-check latency: 3-10 seconds per claim (Perplexity searches the web and evaluates sources) UI update latency: 100ms from database write to React re-render (Convex reactive subscriptions)
Total time from spoken word to verified claim: 5-15 seconds depending on claim complexity and API response times.

Next steps

API Reference

Explore Convex backend functions

Deploy your own

Set up environment variables and deploy Stanzo to Vercel + Convex

Build docs developers (and LLMs) love