Innova AI Engine Pipeline Architecture and System Design

Innova AI Engine sits at the asynchronous processing layer of the SuperProfe platform. The innova-backend-serverless API handles all synchronous student and teacher interactions, and when a task requires ML inference — classifying an error, grading a submission, extracting questions from a PDF — the backend enqueues a message to one of its SQS queues. The engine picks up that message, calls the appropriate model provider or runs a mathematical calibration, and writes structured results back to the shared Postgres database. No result is ever returned synchronously to the API; the backend’s own polling and webhook mechanisms surface results to the frontend after the engine has written them.

Pipeline diagram

The following diagram shows all message flows between the backend API, the SQS queues, the ten Lambda functions, the model providers, and the shared storage layer:

Lambda function reference

serverless.yml defines ten functions. All are deployed as container images built from the same Dockerfile.lambda, with the handler module set per-function via the command override:

Function	Trigger	Purpose	Timeout	Memory
`health`	HTTP GET `/health`	Liveness probe	10 s	128 MB
`llmClassifier`	SQS `llm-classify-queue` (batch 20)	Classify `UNCLASSIFIED` attempts via Claude Haiku with prompt caching and forced `tool_use`	300 s	512 MB
`ocrWorker`	SQS `ocr-queue` (batch 5)	Transcribe handwritten math — Gemini primary, Claude vision escalation	60 s	512 MB
`guideIngest`	SQS `guide-ingest-queue` (batch 1)	Extract questions from a worksheet PDF (Gemini precheck → Claude → pypdfium2 figures)	600 s	2048 MB
`solutionGenerator`	SQS `solution-generation-queue` (batch 1)	Build step-by-step solution key for all extracted questions	600 s	1024 MB
`submissionGrader`	SQS `submission-grade-queue` (batch 5)	Transcribe and grade student photo submissions; republish to `attempt-reprocess`	120 s	512 MB
`exerciseGenerator`	SQS (batch 1)	Generate new exercises for a topic on teacher demand via Claude Haiku	300 s	512 MB
`nightlyBkt`	EventBridge `cron(0 7 * * ? *)`	Recalibrate BKT parameters (grid search, step 0.05) across all topics	900 s	1024 MB
`nightlyIrt`	EventBridge `cron(15 7 * * ? *)`	Recalibrate IRT 2PL item parameters (L-BFGS-B MLE) for exercises with ≥50 attempts	900 s	1024 MB
`hourlyAlerts`	EventBridge `cron(0 * * * ? *)`	Detect at-risk students, raise deduplicated `TeacherAlert` records	900 s	1024 MB

adhoc_solver (A10) exists in src/adhoc_solver/ but is not wired as a function in serverless.yml. It is a follow-up feature for ad-hoc scan solving without a guide context and will be added to the function list when it reaches production readiness.

Clean Architecture layers

Every worker package in src/ follows the same four-layer structure. The dependency arrow always points inward — the domain never imports adapters, and adapters never import domain logic:

1. Domain (`domain.py`) — pure logic

Contains all business rules and mathematical algorithms with zero I/O and zero framework imports. Examples:

src/bkt/domain.py — BKT grid search over (p_l0, p_transit, p_slip, p_guess) minimizing negative log-likelihood.
src/irt/domain.py — IRT 2PL MLE fitting with scipy.optimize.minimize (L-BFGS-B), Fisher information I(θ) = a²·P(θ)·(1−P(θ)).
src/llm_classifier/domain.py — batch construction, Claude tool-use schema, result parsing and confidence logic.
src/submission_grader/domain.py — grading rubric evaluation, transcription confidence gating.

Because domain functions are pure (no side effects), they are directly unit-testable with pytest and property-testable with hypothesis.

2. Ports (`ports.py`) — I/O contracts

Defines Python Protocol classes that describe exactly what each piece of I/O looks like. The domain depends on ports, not on concrete libraries:

# Example structural protocol
class AttemptRepoPort(Protocol):
    async def fetch_attempts_for_topic(self, topic_id: str) -> list[Attempt]: ...
    async def write_bkt_params(self, topic_id: str, params: BktParams) -> None: ...

Adapters satisfy these protocols at runtime (structural typing — no explicit implements). Test doubles also satisfy them, allowing unit tests to pass mock objects that fulfil the protocol shape.

3. Adapters (`src/shared/`) — concrete I/O

Concrete implementations of the port protocols:

Adapter	Port it satisfies	External dependency
`asyncpg_repo.py`	`AttemptRepoPort`, `GuideRepoPort`, …	`asyncpg` connection pool
`sqs_adapter.py`	`QueuePublisherPort`	`boto3` SQS client
`s3_adapter.py`	`ObjectStorePort`	`boto3` S3 client
`anthropic_adapter.py`	`LLMClassifierPort`, `GraderPort`, …	`anthropic` SDK
`gemini_adapter.py`	`MathOCRPort`, `PDFPrecheckPort`	`google-genai` SDK

Adapters are the only files that import external libraries. This means dependency upgrades, provider swaps, or mocking in tests never touch domain logic.

4. Pipeline handler (`src/pipeline/<worker>.handler`) — Lambda entrypoint

A thin function that:

Reads the Lambda event and context parameters.
Loads Settings from src/shared/settings.py (validated by pydantic-settings, sourced from environment variables).
Instantiates adapters using those settings.
Calls the domain with the concrete adapters injected via the port protocols.
Returns (or raises for SQS ReportBatchItemFailures).

No business logic belongs here. The handler is intentionally so thin that it can be tested by invoking it with a crafted event dict and asserting on Postgres state via the adapter.

The guides pipeline (v9)

The document AI pipeline is the most complex multi-step flow in the engine. It chains three Lambda functions via two SQS hops:

guideIngest → [solution-generation-queue] → solutionGenerator → [Postgres REVIEW state]
                                                                         ↓
                                         (student uploads photos)
                                                                         ↓
submissionGrader → [attempt-reprocess-queue] → innova-backend-serverless API

Step 1 — guideIngest (triggered by guide-ingest-queue):

Downloads the PDF from S3_GUIDES_BUCKET.
Runs a Gemini precheck to assess whether the document is a math worksheet.
Chunks the PDF into pages (default 20 pages with 1-page overlap, configurable via GUIDE_INGEST_CHUNK_PAGES / GUIDE_INGEST_CHUNK_OVERLAP).
Calls Claude (Sonnet) to extract structured questions from each chunk.
Renders embedded figures using pypdfium2 and attaches them to extracted questions.
Validates overall extraction quality against GUIDE_MIN_EXTRACTION_QUALITY (default 0.5).
Publishes a message to SQS_SOLUTION_GEN_URL and writes question records to Postgres.

Step 2 — solutionGenerator (triggered by solution-generation-queue):

Reads the extracted questions from Postgres.
Classifies each question by math topic (confidence gate: SOLUTION_TOPIC_MIN_CONFIDENCE = 0.85).
Calls Claude (Sonnet) to generate a step-by-step solution key per question (batched if SOLUTION_GEN_USE_BATCHES=true).
Writes the solution key to Postgres and sets the guide status to REVIEW.

Step 3 — submissionGrader (triggered by submission-grade-queue):

Retrieves the student’s uploaded photos from S3_SUBMISSIONS_BUCKET.
Transcribes each photo using Claude Haiku vision with the cached solution key as context.
Validates transcription confidence against GRADING_MIN_TRANSCRIPTION_CONFIDENCE (default 0.5).
Grades the transcribed steps against the solution key.
Writes grading results to Postgres.
Publishes a message to SQS_ATTEMPT_REPROCESS_URL so the backend converts grading results into student attempt records.

Each stage has a corresponding SSM kill-switch parameter (SSM_GUIDES_INGEST_PAUSED_PARAM, SSM_GUIDES_SOLUTION_PAUSED_PARAM, SSM_GUIDES_GRADING_PAUSED_PARAM). Setting any of these to true in AWS SSM Parameter Store causes that worker to drop incoming messages to the DLQ with paused_due_to_cost metadata — no code change or redeployment needed.

Storage topology

Store	Owner	Used by
Supabase Postgres (session pooler `:5432`)	Backend stack	All workers read/write via `asyncpg`
S3 guides bucket (`S3_GUIDES_BUCKET`)	Backend stack	`guideIngest` reads PDFs; writes figure assets
S3 submissions bucket (`S3_SUBMISSIONS_BUCKET`)	Backend stack	`submissionGrader` reads student photos
MongoDB (`MONGODB_URI`)	Backend stack	Telemetry, audit logs via `src/observability`

Deploy order constraint

The SQS queues, S3 buckets, and SSM parameters consumed by this engine are created and exported by the innova-backend-serverless Serverless Framework stack. The AI engine’s serverless.yml references them via ${env:SQS_GUIDE_INGEST_ARN}, ${env:S3_GUIDES_BUCKET}, etc. If the backend stack does not exist yet, those environment variable lookups will fail and the CloudFormation deployment will error.Always deploy in this order:

innova-backend-serverless (creates queues, buckets, SSM params)
innova-ai-engine (consumes the ARNs/URLs exported by the backend)

Observability and cost control

Every provider call in the engine records token usage and estimated cost to structlog’s JSON output via src/observability. Log lines include a trace_id, worker, model, input_tokens, output_tokens, and cost_usd field — making it straightforward to aggregate per-worker inference spend in any log aggregator. SSM kill-switches allow individual pipeline stages to be paused without a redeployment:

SSM Parameter	Controls
`/innova/llm/paused`	`llmClassifier`
`/innova/ocr/paused`	`ocrWorker`
`/innova/guides/ingest_paused`	`guideIngest`
`/innova/guides/solution_paused`	`solutionGenerator`
`/innova/guides/grading_paused`	`submissionGrader`
`/innova/guides/grading_cheap_mode`	Downgrades `submissionGrader` to a cheaper model under cost pressure

Get Started

Core Concepts

Workers

Configuration & Operations

Deployment

Innova AI Engine Pipeline Architecture and System Design

Pipeline diagram

Lambda function reference

Clean Architecture layers

1. Domain (`domain.py`) — pure logic

2. Ports (`ports.py`) — I/O contracts

3. Adapters (`src/shared/`) — concrete I/O

4. Pipeline handler (`src/pipeline/<worker>.handler`) — Lambda entrypoint

The guides pipeline (v9)

Storage topology

Deploy order constraint

Observability and cost control

Build docs developers (and LLMs) love

Get Started

Core Concepts

Workers

Configuration & Operations

Deployment

Documentation Index

​Pipeline diagram

​Lambda function reference

​Clean Architecture layers

​1. Domain (domain.py) — pure logic

​2. Ports (ports.py) — I/O contracts

​3. Adapters (src/shared/) — concrete I/O

​4. Pipeline handler (src/pipeline/<worker>.handler) — Lambda entrypoint

​The guides pipeline (v9)

​Storage topology

​Deploy order constraint

​Observability and cost control

Build docs developers (and LLMs) love

Pipeline diagram

Lambda function reference

Clean Architecture layers

1. Domain (`domain.py`) — pure logic

2. Ports (`ports.py`) — I/O contracts

3. Adapters (`src/shared/`) — concrete I/O

4. Pipeline handler (`src/pipeline/<worker>.handler`) — Lambda entrypoint

The guides pipeline (v9)

Storage topology

Deploy order constraint

Observability and cost control