Transcript Chunks API

Overview

The Transcript Chunks API handles storage and retrieval of speech-to-text chunks during live debates. Each chunk represents a segment of speech with precise timing information and tracks whether it has been processed for claim extraction. Function Types:

Public Mutations: insert, triggerExtraction
Public Query: listByDebate
Internal Query: getUnprocessed
Internal Mutation: markProcessed

insert

import { api } from "@/convex/_generated/api";

await convex.mutation(api.transcriptChunks.insert, {
  debateId: debateId,
  speaker: 0,
  text: "I believe the economy has grown significantly over the past year.",
  startTime: 1234567890,
  endTime: 1234567895
});

Inserts a new transcript chunk into the database. Used during live transcription to store speech segments as they are recognized.

Parameters

debateId

Id<'debates'>

required

The ID of the debate this transcript belongs to

speaker

0 | 1

required

Which speaker is talking (0 = Speaker A, 1 = Speaker B)

text

string

required

The transcribed text content

startTime

number

required

Unix timestamp (milliseconds) when this speech segment started

endTime

number

required

Unix timestamp (milliseconds) when this speech segment ended

Returns

return

null

Returns null on success

Behavior

Automatically sets processedForClaims to false
Chunks are ready for claim extraction processing

triggerExtraction

import { api } from "@/convex/_generated/api";

await convex.mutation(api.transcriptChunks.triggerExtraction, {
  debateId: debateId
});

Manually triggers claim extraction for a debate. Schedules the claim extraction action to run immediately.

Parameters

debateId

Id<'debates'>

required

The ID of the debate to process

Returns

return

null

Returns null on success

Behavior

Schedules internal.claimExtraction.extract to run immediately (0ms delay)
Does not wait for extraction to complete
Can be called multiple times safely

listByDebate

import { api } from "@/convex/_generated/api";

const chunks = await convex.query(api.transcriptChunks.listByDebate, {
  debateId: debateId
});

Retrieves all transcript chunks for a specific debate, ordered by time.

Parameters

debateId

Id<'debates'>

required

The ID of the debate to retrieve transcripts for

Returns

chunks

TranscriptChunk[]

Array of all transcript chunks for the debate

Chunk Object Structure

_id

Id<'transcriptChunks'>

Unique chunk identifier

_creationTime

number

Convex automatic creation timestamp

debateId

Id<'debates'>

ID of the associated debate

speaker

0 | 1

Which speaker produced this chunk (0 = Speaker A, 1 = Speaker B)

text

string

The transcribed text content

startTime

number

Unix timestamp when speech segment started

endTime

number

Unix timestamp when speech segment ended

processedForClaims

boolean

Whether this chunk has been processed for claim extraction

Behavior

Uses by_debate_and_time index for efficient querying
Results ordered by start time (implicit from index)

getUnprocessed

import { internal } from "@/convex/_generated/api";

const unprocessed = await ctx.runQuery(internal.transcriptChunks.getUnprocessed, {
  debateId: debateId
});

Internal query to retrieve all transcript chunks that haven’t been processed for claim extraction.

Parameters

debateId

Id<'debates'>

required

The ID of the debate

Returns

chunks

TranscriptChunk[]

Array of unprocessed chunks (where processedForClaims === false)

Behavior

Uses by_debate_unprocessed compound index for efficient filtering
Only returns chunks with processedForClaims: false

markProcessed

import { internal } from "@/convex/_generated/api";

await ctx.runMutation(internal.transcriptChunks.markProcessed, {
  chunkIds: [chunkId1, chunkId2, chunkId3]
});

Internal mutation to mark multiple transcript chunks as processed for claim extraction.

Parameters

chunkIds

Id<'transcriptChunks'>[]

required

Array of chunk IDs to mark as processed

Returns

return

null

Returns null on success

Behavior

Sets processedForClaims: true for each provided chunk ID
Processes chunks sequentially in a loop
Prevents duplicate claim extraction from same chunks

Validation Schema

const chunkValidator = v.object({
  _id: v.id("transcriptChunks"),
  _creationTime: v.number(),
  debateId: v.id("debates"),
  speaker: v.union(v.literal(0), v.literal(1)),
  text: v.string(),
  startTime: v.number(),
  endTime: v.number(),
  processedForClaims: v.boolean(),
})

Processing Flow

Typical Usage Pattern

During Debate: Call insert for each speech segment as it’s transcribed
Periodic or End: Call triggerExtraction to process accumulated chunks
Extraction Action: Uses getUnprocessed to fetch new chunks
After Processing: Calls markProcessed to prevent reprocessing

Timing Information

The startTime and endTime fields enable:

Precise temporal tracking of when claims were made
Synchronization with audio/video playback
Time-based analysis of debate flow
Chronological ordering of transcript segments

Convex Functions

Schema

Transcript Chunks API

Overview

insert

Parameters

Returns

Behavior

triggerExtraction

Parameters

Returns

Behavior

listByDebate

Parameters

Returns

Chunk Object Structure

Behavior

getUnprocessed

Parameters

Returns

Behavior

markProcessed

Parameters

Returns

Behavior

Validation Schema

Processing Flow

Typical Usage Pattern

Timing Information

Build docs developers (and LLMs) love

Convex Functions

Schema

​Overview

​insert

​Parameters

​Returns

​Behavior

​triggerExtraction

​Parameters

​Returns

​Behavior

​listByDebate

​Parameters

​Returns

​Chunk Object Structure

​Behavior

​getUnprocessed

​Parameters

​Returns

​Behavior

​markProcessed

​Parameters

​Returns

​Behavior

​Validation Schema

​Processing Flow

​Typical Usage Pattern

​Timing Information

Build docs developers (and LLMs) love

Overview

insert

Parameters

Returns

Behavior

triggerExtraction

Parameters

Returns

Behavior

listByDebate

Parameters

Returns

Chunk Object Structure

Behavior

getUnprocessed

Parameters

Returns

Behavior

markProcessed

Parameters

Returns

Behavior

Validation Schema

Processing Flow

Typical Usage Pattern

Timing Information