SQS Queues and Message Schemas in Innova Serverless

Innova Backend Serverless owns and provisions eight SQS queues (plus six DLQs) via CloudFormation resources in serverless.yml. This backend stack is the source of truth for all queue infrastructure — the innova-ai-engine service consumes several of these queues but never creates them. Queue URLs are injected into every Lambda function’s environment at deploy time via Ref and Fn::GetAtt CloudFormation intrinsics, ensuring zero hard-coded ARNs or URLs in code.

Queue Overview

Queue (logical name)	CloudFormation resource	Type	Visibility Timeout	Retention	DLQ	Consumer
`AttemptStreamQueue`	`attempt-stream.fifo`	FIFO	60 s	24 h	—	`telemetryWorker` Lambda
`LlmClassifyQueue`	`llm-classify-queue`	Standard	360 s	24 h	`LlmClassifyDLQ` (14 d, max 3)	`llmClassifierWorker` Lambda
`OcrQueue`	`ocr-queue`	Standard	60 s	24 h	—	OCR adapter
`AttemptReprocessQueue`	`attempt-reprocess-queue`	Standard	60 s	24 h	`AttemptReprocessDLQ` (14 d, max 5)	`attemptReprocessWorker` Lambda
`GuideIngestQueue`	`guide-ingest-queue`	Standard	900 s	24 h	`GuideIngestDLQ` (14 d, max 3)	`innova-ai-engine`
`SolutionGenQueue`	`solution-generation-queue`	Standard	900 s	24 h	`SolutionGenDLQ` (14 d, max 3)	`innova-ai-engine`
`SubmissionGradeQueue`	`submission-grade-queue`	Standard	180 s	24 h	`SubmissionGradeDLQ` (14 d, max 3)	`innova-ai-engine`
`ExerciseGenerateQueue`	`exercise-generate-queue`	Standard	360 s	24 h	`ExerciseGenerateDLQ` (14 d, max 3)	`innova-ai-engine`

All queue names follow the pattern innova-backend-serverless-{stage}-{queue-suffix}, e.g. innova-backend-serverless-prod-llm-classify-queue. The {stage} comes from --stage at deploy time (default dev).

FIFO vs Standard

AttemptStreamQueue is the only FIFO queue. It uses FIFO semantics for two reasons:

Ordering: Keystroke telemetry events for a given attempt must be processed in the order they were produced. FIFO queues guarantee exactly-once, in-order delivery per message group.
Deduplication: ContentBasedDeduplication: true means SQS hashes the message body and silently drops duplicates within the 5-minute deduplication window, preventing double-writes to MongoDB if the publisher retries.

All other queues are Standard because:

The processing logic for each queue is idempotent (upserts, deduplication by attempt_id or upload_id), so at-least-once delivery is safe.
Standard queues offer substantially higher throughput and lower latency than FIFO queues — important for classification and guide pipelines that can burst significantly.
The LlmClassifyQueue uses a 60-second maximumBatchingWindow on its Lambda trigger, which benefits from Standard queue’s ability to accumulate messages quickly.

Message Schemas

Each queue has a typed message contract. All bodies contain only UUIDs and metadata — no PII, no student names or emails (COPPA compliance).

`attempt-stream.fifo` — Telemetry Message

Published by the api Lambda immediately after each attempt is created and classified. The FIFO message group key is attemptId to preserve per-attempt ordering.

// Attempt telemetry envelope published to AttemptStreamQueue
interface AttemptStreamMessage {
  attemptId: string;
  studentId: string;
  exerciseId: string | null;
  errorTagId: string | null;      // null while UNCLASSIFIED
  classifierSource: string;       // "RULE" | "LLM" | "HUMAN"
  isCorrect: boolean;
  traceId: string;
}

`llm-classify-queue` — LLM Classification Request

Published for every attempt the rule engine cannot classify synchronously.

// Message published to LlmClassifyQueue
interface LlmClassifyMessage {
  attemptId: string;
  traceId: string;
}

The llmClassifierWorker uses attemptId to fetch the full attempt context from Postgres before calling Claude.

`guide-ingest-queue` — Guide Ingest Request

Published by POST /guides once the teacher’s PDF is uploaded to S3. Consumed by innova-ai-engine’s guide ingest worker.

// src/shared/sqs/guide-messages.ts
/** backend → ai-engine: `guide-ingest-queue`. */
export interface GuideIngestMessage {
  guide_id: string;
  source_pdf_key: string;         // S3 key under guides/uploads/
  course_grade_level: number;
  trace_id: string;
}

`solution-generation-queue` — Solution Generation Request

Enqueued after a guide’s questions are extracted and approved, requesting AI-generated step-by-step solution keys. A null guide_question_id means “regenerate all questions in this guide”.

// src/shared/sqs/guide-messages.ts
export interface SolutionGenMessage {
  guide_id: string;
  guide_question_id: string | null; // null = whole guide
  trace_id: string;
}

`submission-grade-queue` — Submission Grading Request

Published when a student uploads a photo of their handwritten answer. The photo_keys array holds 1–3 S3 object keys for multi-page submissions.

// src/shared/sqs/guide-messages.ts
/** backend → ai-engine: `submission-grade-queue`. */
export interface SubmissionGradeMessage {
  guide_submission_id: string;
  guide_question_id: string;
  solution_version: number;       // which GuideSolution version to grade against
  photo_keys: string[];           // 1-3 S3 keys, lifecycle 30 days
  trace_id: string;
}

`exercise-generate-queue` — On-Demand Exercise Generation

Published by POST /items/generate. The innova-ai-engine creates new Exercise rows targeting specific error codes.

// src/shared/sqs/guide-messages.ts
/** backend → ai-engine: `exercise-generate-queue`. */
export interface ExerciseGenerateMessage {
  subdomain_code: string;
  grade_level: number;
  target_error_codes: string[];   // ErrorTag.code values from the taxonomy
  count: number;
  trace_id: string;
}

`attempt-reprocess-queue` — OCR-to-Attempt Reprocess

Published by innova-ai-engine after grading a photo submission. This is the only queue where the AI engine is the producer and the backend is the consumer.

// src/shared/sqs/guide-messages.ts
export interface AttemptReprocessMessage {
  attempt_id: string | null;      // null for new PHOTO_GUIDE attempts (ADR-120)
  latex_steps: string[];
  provider: string;               // "gemini" | "anthropic"
  confidence: number;
  trace_id: string;
  guide_submission_id?: string;   // present for guide-based submissions
  guide_question_id?: string;
  alignment_summary?: {
    path: string;
    first_error_checkpoint: number | null;
    score_0_1: number;
  };
}

The AttemptReprocessMessage is retro-compatible. Legacy OCR-loop messages carry only attempt_id and omit guide_* and alignment_summary fields. When guide_submission_id is present, the worker creates a new Attempt with inputMode='PHOTO_GUIDE' instead of updating an existing one.

Dead Letter Queues

Messages that fail maxReceiveCount delivery attempts are automatically moved to the DLQ by SQS. All DLQs have a 14-day retention period to allow investigation and manual replay.

# Example: LlmClassifyQueue with DLQ
LlmClassifyQueue:
  Type: AWS::SQS::Queue
  Properties:
    QueueName: ${self:service}-${self:provider.stage}-llm-classify-queue
    MessageRetentionPeriod: 86400     # 24 hours
    VisibilityTimeout: 360
    RedrivePolicy:
      deadLetterTargetArn: {"Fn::GetAtt": ["LlmClassifyDLQ", "Arn"]}
      maxReceiveCount: 3

LlmClassifyDLQ:
  Type: AWS::SQS::Queue
  Properties:
    QueueName: ${self:service}-${self:provider.stage}-llm-classify-dlq
    MessageRetentionPeriod: 1209600   # 14 days

DLQ summary:

DLQ	Source Queue	`maxReceiveCount`
`LlmClassifyDLQ`	`LlmClassifyQueue`	3
`AttemptReprocessDLQ`	`AttemptReprocessQueue`	5
`GuideIngestDLQ`	`GuideIngestQueue`	3
`SolutionGenDLQ`	`SolutionGenQueue`	3
`SubmissionGradeDLQ`	`SubmissionGradeQueue`	3
`ExerciseGenerateDLQ`	`ExerciseGenerateQueue`	3

Monitor DLQ depth in CloudWatch with the ApproximateNumberOfMessagesVisible metric. A non-zero value on any DLQ indicates a persistent processing failure and should trigger an alert. Billing killswitches (LLM_PAUSED, OCR_PAUSED) cause workers to stop processing and let messages accumulate in the DLQ intentionally when AI cost thresholds are hit.

Local Development

LocalStack emulates SQS and S3 at http://localhost:4566. The docker-compose.yml starts LocalStack automatically alongside MongoDB.

# Start local infrastructure
docker compose up -d

# Run the API with serverless-offline (routes HTTP + SQS events locally)
pnpm start:dev

Queue URLs in local .env point to LocalStack:

SQS_ATTEMPT_STREAM_URL=http://localhost:4566/000000000000/innova-backend-serverless-dev-attempt-stream.fifo
SQS_LLM_CLASSIFY_URL=http://localhost:4566/000000000000/innova-backend-serverless-dev-llm-classify-queue
SQS_OCR_QUEUE_URL=http://localhost:4566/000000000000/innova-backend-serverless-dev-ocr-queue
SQS_ATTEMPT_REPROCESS_URL=http://localhost:4566/000000000000/innova-backend-serverless-dev-attempt-reprocess-queue

To exercise the OCR → attempt reprocess loop locally, drain the reprocess queue using the dedicated consumer script:

pnpm consume:reprocess

This runs scripts/local-reprocess-consumer.ts which polls AttemptReprocessQueue on LocalStack and calls the AttemptReprocessWorker service directly.

The backend creates and owns all queues. If you add a new SQS queue, you must:

Add the AWS::SQS::Queue (and optionally DLQ) resource to serverless.yml under resources.Resources.
Add sqs:SendMessage / sqs:ReceiveMessage / sqs:DeleteMessage / sqs:GetQueueAttributes / sqs:GetQueueUrl permissions for the new queue ARN in provider.iam.role.statements.
Add the queue URL to provider.environment so all functions can reference it.
If innova-ai-engine will consume it, export the ARN via resources.Outputs and import it in the ai-engine stack — the deployment order is always backend first, then ai-engine.

Get Started

Core Concepts

Configuration

Infrastructure

SQS Queues and Message Schemas in Innova Serverless

Queue Overview

FIFO vs Standard

Message Schemas

`attempt-stream.fifo` — Telemetry Message

`llm-classify-queue` — LLM Classification Request

`guide-ingest-queue` — Guide Ingest Request

`solution-generation-queue` — Solution Generation Request

`submission-grade-queue` — Submission Grading Request

`exercise-generate-queue` — On-Demand Exercise Generation

`attempt-reprocess-queue` — OCR-to-Attempt Reprocess

Dead Letter Queues

Local Development

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Infrastructure

Documentation Index

​Queue Overview

​FIFO vs Standard

​Message Schemas

​attempt-stream.fifo — Telemetry Message

​llm-classify-queue — LLM Classification Request

​guide-ingest-queue — Guide Ingest Request

​solution-generation-queue — Solution Generation Request

​submission-grade-queue — Submission Grading Request

​exercise-generate-queue — On-Demand Exercise Generation

​attempt-reprocess-queue — OCR-to-Attempt Reprocess

​Dead Letter Queues

​Local Development

Build docs developers (and LLMs) love

Queue Overview

FIFO vs Standard

Message Schemas

`attempt-stream.fifo` — Telemetry Message

`llm-classify-queue` — LLM Classification Request

`guide-ingest-queue` — Guide Ingest Request

`solution-generation-queue` — Solution Generation Request

`submission-grade-queue` — Submission Grading Request

`exercise-generate-queue` — On-Demand Exercise Generation

`attempt-reprocess-queue` — OCR-to-Attempt Reprocess

Dead Letter Queues

Local Development