Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vruizz22/innova-ai-engine/llms.txt

Use this file to discover all available pages before exploring further.

The ocrWorker converts photos of handwritten student math work into structured LaTeX step sequences. It is triggered by S3 object-creation events forwarded through the ocr-queue, downloads the raw image bytes, strips EXIF metadata, and passes the image to OcrOrchestrator. The orchestrator first attempts extraction with Gemini 2.5 Flash; if the returned confidence falls below the configured threshold (default 0.7), it escalates automatically to Claude vision. The higher-confidence result is returned as an OcrResult containing a list of LaTeX steps, an overall confidence score, the provider that produced the result, an optional topic hint, and an estimated cost in USD. Results are logged and the processed count is returned to Lambda.

Trigger & configuration

Queue

SQS ocr-queue
ARN from env SQS_OCR_QUEUE_ARN

Lambda settings

Timeout: 60 s · Memory: 512 MB
Handler: src.pipeline.ocr_worker.handler
SettingValue
batchSize5
functionResponseTypeReportBatchItemFailures

SQS message body — S3 event record

Each record is an S3-event notification delivered through SQS. The worker reads the bucket name and object key from the standard S3 event envelope:
{
  "Records": [
    {
      "s3": {
        "bucket": {
          "name": "innova-submissions-prod"
        },
        "object": {
          "key": "handwriting/student-uuid/attempt-uuid.jpg"
        }
      }
    }
  ]
}
s3.bucket.name
string
required
Name of the S3 bucket that received the image.
s3.object.key
string
required
S3 object key of the uploaded handwritten image. Typically a JPEG or PNG of the student’s work.

Execution flow

1

Download image from S3

For each record, the worker calls s3_client.get_object(Bucket=bucket_name, Key=key) and reads the response body as raw bytes. The bytes are then passed through strip_exif_and_validate to remove EXIF metadata and validate the image format before any model call.
2

Extract LaTeX steps via OcrOrchestrator

OcrOrchestrator.extract(image_bytes, trace_id) is called. Internally the orchestrator runs the two-model pipeline (see Gemini → Claude escalation below) and returns an OcrResult.
3

Log and return

The worker logs ocr_complete with the S3 key, provider used, overall confidence, and number of LaTeX steps extracted. The processed count is returned in the Lambda response.
{ "processed": 5 }

OcrResult schema

class OcrResult(BaseModel):
    latex_steps: list[str]         # ordered LaTeX expressions for each work step
    overall_confidence: float      # 0.0 – 1.0; composite confidence across all steps
    provider: OcrProvider          # "GEMINI" or "CLAUDE"
    topic_hint: str | None         # optional topic inferred by the model
    cost_estimated_usd: float      # estimated inference cost in USD
latex_steps
list[str]
Ordered list of LaTeX strings representing each step of the student’s handwritten work. May be empty if the image is completely illegible.
overall_confidence
float
Composite confidence score in [0.0, 1.0] reflecting how reliably the model read the handwriting. Drives the Gemini → Claude escalation decision.
provider
string
Which model produced the final result: "GEMINI" or "CLAUDE".
topic_hint
string | null
Optional topic inferred from the content of the image (e.g. "algebra"). May be null when the model cannot determine the topic.
cost_estimated_usd
float
Estimated cost of the inference call(s) in US dollars, used for cost accounting.

Gemini → Claude escalation

OcrOrchestrator implements a two-stage extraction strategy to balance cost and quality:
primary = await GeminiAdapter.extract(image_bytes)

if primary.overall_confidence >= OCR_CONFIDENCE_THRESHOLD:
    return primary          # Gemini was confident — done

fallback = await ClaudeAdapter.extract(image_bytes)

return fallback if fallback.overall_confidence > primary.overall_confidence else primary
OCR_CONFIDENCE_THRESHOLD defaults to 0.7 and is configurable via the OCR_CONFIDENCE_THRESHOLD environment variable. Lowering it reduces Claude escalations and cost; raising it increases Claude usage but improves LaTeX accuracy on ambiguous handwriting.
The escalation path always picks the higher-confidence result: if Claude’s confidence is lower than Gemini’s (unusual but possible), the primary Gemini result is returned.

Cost killswitch

The grader checks an SSM parameter before calling any model. The parameter name is configured via SSM_OCR_PAUSED_PARAM (default /innova/ocr/paused). When its value is the string "true", a PausedError is raised and the message is returned to the queue for later retry.
When the OCR killswitch is active, all records in the batch are returned as batchItemFailures. SQS will redeliver them according to the queue’s retry and visibility-timeout settings.

Partial batch failure handling

The worker processes records in a for loop. An unhandled exception on a single record will propagate out of _main, causing Lambda to mark the entire batch for redelivery. Records are processed individually so a single bad image does not block all five.
To protect against a permanently unreadable image cycling indefinitely, configure a Dead-Letter Queue (DLQ) on ocr-queue with an appropriate maxReceiveCount.

Build docs developers (and LLMs) love