Skip to main content

Overview

The Transcript Chunks API handles storage and retrieval of speech-to-text chunks during live debates. Each chunk represents a segment of speech with precise timing information and tracks whether it has been processed for claim extraction. Function Types:
  • Public Mutations: insert, triggerExtraction
  • Public Query: listByDebate
  • Internal Query: getUnprocessed
  • Internal Mutation: markProcessed

insert

import { api } from "@/convex/_generated/api";

await convex.mutation(api.transcriptChunks.insert, {
  debateId: debateId,
  speaker: 0,
  text: "I believe the economy has grown significantly over the past year.",
  startTime: 1234567890,
  endTime: 1234567895
});
Inserts a new transcript chunk into the database. Used during live transcription to store speech segments as they are recognized.

Parameters

debateId
Id<'debates'>
required
The ID of the debate this transcript belongs to
speaker
0 | 1
required
Which speaker is talking (0 = Speaker A, 1 = Speaker B)
text
string
required
The transcribed text content
startTime
number
required
Unix timestamp (milliseconds) when this speech segment started
endTime
number
required
Unix timestamp (milliseconds) when this speech segment ended

Returns

return
null
Returns null on success

Behavior

  • Automatically sets processedForClaims to false
  • Chunks are ready for claim extraction processing

triggerExtraction

import { api } from "@/convex/_generated/api";

await convex.mutation(api.transcriptChunks.triggerExtraction, {
  debateId: debateId
});
Manually triggers claim extraction for a debate. Schedules the claim extraction action to run immediately.

Parameters

debateId
Id<'debates'>
required
The ID of the debate to process

Returns

return
null
Returns null on success

Behavior

  • Schedules internal.claimExtraction.extract to run immediately (0ms delay)
  • Does not wait for extraction to complete
  • Can be called multiple times safely

listByDebate

import { api } from "@/convex/_generated/api";

const chunks = await convex.query(api.transcriptChunks.listByDebate, {
  debateId: debateId
});
Retrieves all transcript chunks for a specific debate, ordered by time.

Parameters

debateId
Id<'debates'>
required
The ID of the debate to retrieve transcripts for

Returns

chunks
TranscriptChunk[]
Array of all transcript chunks for the debate

Chunk Object Structure

_id
Id<'transcriptChunks'>
Unique chunk identifier
_creationTime
number
Convex automatic creation timestamp
debateId
Id<'debates'>
ID of the associated debate
speaker
0 | 1
Which speaker produced this chunk (0 = Speaker A, 1 = Speaker B)
text
string
The transcribed text content
startTime
number
Unix timestamp when speech segment started
endTime
number
Unix timestamp when speech segment ended
processedForClaims
boolean
Whether this chunk has been processed for claim extraction

Behavior

  • Uses by_debate_and_time index for efficient querying
  • Results ordered by start time (implicit from index)

getUnprocessed

import { internal } from "@/convex/_generated/api";

const unprocessed = await ctx.runQuery(internal.transcriptChunks.getUnprocessed, {
  debateId: debateId
});
Internal query to retrieve all transcript chunks that haven’t been processed for claim extraction.

Parameters

debateId
Id<'debates'>
required
The ID of the debate

Returns

chunks
TranscriptChunk[]
Array of unprocessed chunks (where processedForClaims === false)

Behavior

  • Uses by_debate_unprocessed compound index for efficient filtering
  • Only returns chunks with processedForClaims: false

markProcessed

import { internal } from "@/convex/_generated/api";

await ctx.runMutation(internal.transcriptChunks.markProcessed, {
  chunkIds: [chunkId1, chunkId2, chunkId3]
});
Internal mutation to mark multiple transcript chunks as processed for claim extraction.

Parameters

chunkIds
Id<'transcriptChunks'>[]
required
Array of chunk IDs to mark as processed

Returns

return
null
Returns null on success

Behavior

  • Sets processedForClaims: true for each provided chunk ID
  • Processes chunks sequentially in a loop
  • Prevents duplicate claim extraction from same chunks

Validation Schema

const chunkValidator = v.object({
  _id: v.id("transcriptChunks"),
  _creationTime: v.number(),
  debateId: v.id("debates"),
  speaker: v.union(v.literal(0), v.literal(1)),
  text: v.string(),
  startTime: v.number(),
  endTime: v.number(),
  processedForClaims: v.boolean(),
})

Processing Flow

Typical Usage Pattern

  1. During Debate: Call insert for each speech segment as it’s transcribed
  2. Periodic or End: Call triggerExtraction to process accumulated chunks
  3. Extraction Action: Uses getUnprocessed to fetch new chunks
  4. After Processing: Calls markProcessed to prevent reprocessing

Timing Information

The startTime and endTime fields enable:
  • Precise temporal tracking of when claims were made
  • Synchronization with audio/video playback
  • Time-based analysis of debate flow
  • Chronological ordering of transcript segments

Build docs developers (and LLMs) love