Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vruizz22/innova-ai-engine/llms.txt

Use this file to discover all available pages before exploring further.

The llmClassifier worker is the core error-intelligence pipeline of Innova. It consumes UNCLASSIFIED student attempt records from the llm-classify-queue, groups them by academic domain, and routes each group to Claude Haiku (claude-haiku-4-5-20251001) for classification against a proprietary 2,600+ error taxonomy aligned to the Chilean K-12 curriculum. Results are written back to the attempts table in Postgres with a classifier source, confidence score, and resolved error_tag_id. Attempts the model still cannot classify trigger a second-pass suggestion flow that proposes and persists new error_tag entries, growing the catalog organically.

Trigger & configuration

Queue

SQS llm-classify-queue
ARN from env SQS_LLM_CLASSIFY_ARN

Lambda settings

Timeout: 300 s · Memory: 512 MB
Handler: src.pipeline.llm_consumer.handler
SettingValue
batchSize20
maximumBatchingWindow60 s
functionResponseTypeReportBatchItemFailures

SQS message body — Attempt

Each record body is a JSON-serialised Attempt object (validated via Pydantic model_validate_json).
{
  "id": "uuid-of-the-attempt",
  "student_id": "uuid-of-the-student",
  "domain_id": "uuid-of-the-domain-or-null",
  "subdomain_code": "ALG-LINEAR-EQ",
  "topic": "linear_equations",
  "problem_statement": "Solve for x: 2x + 3 = 7",
  "canonical_solution": "x = 2",
  "raw_steps": ["2x = 7 - 3", "2x = 4", "x = 2"],
  "final_answer": "x = 3"
}
id
string
required
UUID of the student attempt — used as the primary key when writing classification results back to Postgres.
student_id
string
UUID of the student. Included for logging context; no PII is forwarded to Claude.
domain_id
string | null
UUID of the academic domain. When present, the worker fetches the domain-specific catalog and uses the v8 by-domain prompt. null falls back to the generic v7 classifier.
subdomain_code
string | null
Subdomain code (e.g. ALG-LINEAR-EQ). Used as the fallback label when topic is null (post-taxonomy migration).
topic
string | null
Teacher-confirmed topic code. Optional — only present when the teacher has pinned the question to a Topic.
problem_statement
string
required
The problem text shown to the student.
canonical_solution
string
required
The correct reference solution used as context for the classifier.
raw_steps
array
required
The student’s work steps as recorded by the backend.
final_answer
string
required
The student’s final submitted answer.

Two-pass classification flow

1

Domain grouping

_group_by_domain partitions the incoming batch by domain_id. A single SQS batch of 20 can contain attempts from multiple domains; each group is classified independently so the domain-specific catalog and prompt are applied correctly (ADR A4.3).
2

First-pass classification (v8 / v7)

For each domain group:
  • If domain_id is present and an ACTIVE catalog exists in Postgres → classify_batch_for_domain is called with the domain-specialised system prompt and a tool whose enum is restricted to that domain’s error codes.
  • If domain_id is absent or no catalog is found → classify_batch is called with the generic v7 prompt covering the full 2,600+ taxonomy.
Both paths use claude-haiku-4-5-20251001 with cache_control: ephemeral on the system block and tool_choice: {"type": "tool", "name": "classify_errors"} to force structured output.
3

Write results to Postgres

Results are written to the attempts table in a single transaction using _UPDATE_SQL:
UPDATE attempts
   SET error_tag_id = (
           SELECT id FROM error_tags
            WHERE code = $1
              AND status = 'ACTIVE'
              AND code NOT IN ('CORRECT', 'UNCLASSIFIED', 'TRANSVERSAL_LIKELY')
       ),
       classifier_source = 'LLM',
       confidence = $2,
       classified_at = NOW(),
       status = CASE
           WHEN $1 IN ('CORRECT', 'UNCLASSIFIED', 'TRANSVERSAL_LIKELY')
                AND $1 <> 'CORRECT' THEN 'PENDING'
           ELSE 'CLASSIFIED'
       END
 WHERE id = $3
The SPECIAL_TYPES sentinel (CORRECT, UNCLASSIFIED, TRANSVERSAL_LIKELY) are never resolved to a catalog FK — even if a seed error_tag with that code exists — because the status filter on ACTIVE alone is not sufficient. UNCLASSIFIED and TRANSVERSAL_LIKELY remain PENDING; everything else (including CORRECT with a NULL tag) becomes CLASSIFIED.
4

Second-pass: suggest new error types

After the first pass, attempts whose error_type is still UNCLASSIFIED are collected. suggest_new_error_types proposes new error_tag entries for them. Unique codes are upserted into error_tags (source = 'LLM_GENERATED', status = 'ACTIVE'), and then _UPDATE_SQL is re-run — the subquery now resolves because the tag was just inserted. The catalog cache is cleared after the upsert so subsequent invocations pick up the new entries.
Second-pass failures are logged and swallowed — the primary classification is already persisted. Auto-suggest is a best-effort catalog growth mechanism.

AttemptClassification output schema

class AttemptClassification(BaseModel):
    attempt_id: str
    error_type: str       # error_tag.code, or one of CORRECT / UNCLASSIFIED / TRANSVERSAL_LIKELY
    evidence: str         # free-text reasoning from the model
    confidence: float     # 0.0 – 1.0
attempt_id
string
Echoed from the input — used to correlate results back to their attempts row.
error_type
string
The classified error code. Matches error_tags.code for real errors, or one of the sentinel values CORRECT, UNCLASSIFIED, or TRANSVERSAL_LIKELY.
evidence
string
Model-provided reasoning for the classification. Stored for auditability.
confidence
float
Model confidence in the classification, in the range [0.0, 1.0].

Trace ID propagation

The trace ID is read from the SQS message attributes of the first record in the batch:
attrs = record.get("messageAttributes")
trace = attrs.get("trace_id")
# -> trace["stringValue"]
The extracted value is bound to the structlog context via bind_trace_id(trace_id) so every log line emitted during the invocation carries the same trace ID.

Partial batch failure handling

The function is configured with functionResponseType: ReportBatchItemFailures. If an unhandled exception is raised during the processing of a domain group, the exception propagates out of _main and Lambda retries the entire batch. Per the current implementation, failures are scoped at the domain-group level: a single failing group causes a full re-delivery, so batches are intentionally kept small (max 20) to limit blast radius.
A domain-group exception re-queues the entire 20-message batch. If a single attempt is consistently causing Claude to error, it will block the full batch until it expires to the DLQ.

Cost killswitch

Before calling Claude, the client reads the SSM parameter named by SSM_LLM_PAUSED_PARAM (default /innova/llm/paused). If the value is the string "true", a PausedError is raised immediately and no model call is made. This allows production to pause LLM inference without a redeploy.
Toggle the killswitch via AWS SSM Parameter Store: set /innova/llm/paused to "true" to halt all LLM classification, or "false" to resume. The change takes effect on the next Lambda invocation.

Build docs developers (and LLMs) love