ocrWorker: Handwritten Math OCR Transcription Worker

The ocrWorker converts photos of handwritten student math work into structured LaTeX step sequences. It is triggered by S3 object-creation events forwarded through the ocr-queue, downloads the raw image bytes, strips EXIF metadata, and passes the image to OcrOrchestrator. The orchestrator first attempts extraction with Gemini 2.5 Flash; if the returned confidence falls below the configured threshold (default 0.7), it escalates automatically to Claude vision. The higher-confidence result is returned as an OcrResult containing a list of LaTeX steps, an overall confidence score, the provider that produced the result, an optional topic hint, and an estimated cost in USD. Results are logged and the processed count is returned to Lambda.

Trigger & configuration

Queue

SQS ocr-queue
ARN from env SQS_OCR_QUEUE_ARN

Lambda settings

Timeout: 60 s · Memory: 512 MB
Handler: src.pipeline.ocr_worker.handler

Setting	Value
`batchSize`	`5`
`functionResponseType`	`ReportBatchItemFailures`

SQS message body — S3 event record

Each record is an S3-event notification delivered through SQS. The worker reads the bucket name and object key from the standard S3 event envelope:

{
  "Records": [
    {
      "s3": {
        "bucket": {
          "name": "innova-submissions-prod"
        },
        "object": {
          "key": "handwriting/student-uuid/attempt-uuid.jpg"
        }
      }
    }
  ]
}

s3.bucket.name

string

required

Name of the S3 bucket that received the image.

s3.object.key

string

required

S3 object key of the uploaded handwritten image. Typically a JPEG or PNG of the student’s work.

Execution flow

Download image from S3

For each record, the worker calls s3_client.get_object(Bucket=bucket_name, Key=key) and reads the response body as raw bytes. The bytes are then passed through strip_exif_and_validate to remove EXIF metadata and validate the image format before any model call.

Extract LaTeX steps via OcrOrchestrator

OcrOrchestrator.extract(image_bytes, trace_id) is called. Internally the orchestrator runs the two-model pipeline (see Gemini → Claude escalation below) and returns an OcrResult.

Log and return

The worker logs ocr_complete with the S3 key, provider used, overall confidence, and number of LaTeX steps extracted. The processed count is returned in the Lambda response.

{ "processed": 5 }

`OcrResult` schema

class OcrResult(BaseModel):
    latex_steps: list[str]         # ordered LaTeX expressions for each work step
    overall_confidence: float      # 0.0 – 1.0; composite confidence across all steps
    provider: OcrProvider          # "GEMINI" or "CLAUDE"
    topic_hint: str | None         # optional topic inferred by the model
    cost_estimated_usd: float      # estimated inference cost in USD

latex_steps

list[str]

Ordered list of LaTeX strings representing each step of the student’s handwritten work. May be empty if the image is completely illegible.

overall_confidence

float

Composite confidence score in [0.0, 1.0] reflecting how reliably the model read the handwriting. Drives the Gemini → Claude escalation decision.

provider

string

Which model produced the final result: "GEMINI" or "CLAUDE".

topic_hint

string | null

Optional topic inferred from the content of the image (e.g. "algebra"). May be null when the model cannot determine the topic.

cost_estimated_usd

float

Estimated cost of the inference call(s) in US dollars, used for cost accounting.

Gemini → Claude escalation

OcrOrchestrator implements a two-stage extraction strategy to balance cost and quality:

primary = await GeminiAdapter.extract(image_bytes)

if primary.overall_confidence >= OCR_CONFIDENCE_THRESHOLD:
    return primary          # Gemini was confident — done

fallback = await ClaudeAdapter.extract(image_bytes)

return fallback if fallback.overall_confidence > primary.overall_confidence else primary

OCR_CONFIDENCE_THRESHOLD defaults to 0.7 and is configurable via the OCR_CONFIDENCE_THRESHOLD environment variable. Lowering it reduces Claude escalations and cost; raising it increases Claude usage but improves LaTeX accuracy on ambiguous handwriting.

The escalation path always picks the higher-confidence result: if Claude’s confidence is lower than Gemini’s (unusual but possible), the primary Gemini result is returned.

Cost killswitch

The grader checks an SSM parameter before calling any model. The parameter name is configured via SSM_OCR_PAUSED_PARAM (default /innova/ocr/paused). When its value is the string "true", a PausedError is raised and the message is returned to the queue for later retry.

When the OCR killswitch is active, all records in the batch are returned as batchItemFailures. SQS will redeliver them according to the queue’s retry and visibility-timeout settings.

Partial batch failure handling

The worker processes records in a for loop. An unhandled exception on a single record will propagate out of _main, causing Lambda to mark the entire batch for redelivery. Records are processed individually so a single bad image does not block all five.

To protect against a permanently unreadable image cycling indefinitely, configure a Dead-Letter Queue (DLQ) on ocr-queue with an appropriate maxReceiveCount.

Get Started

Core Concepts

Workers

Configuration & Operations

Deployment

ocrWorker: Handwritten Math OCR Transcription Worker

Trigger & configuration

Queue

Lambda settings

SQS message body — S3 event record

Execution flow

`OcrResult` schema

Gemini → Claude escalation

Cost killswitch

Partial batch failure handling

Build docs developers (and LLMs) love

Get Started

Core Concepts

Workers

Configuration & Operations

Deployment

Documentation Index

​Trigger & configuration

Queue

Lambda settings

​SQS message body — S3 event record

​Execution flow

​OcrResult schema

​Gemini → Claude escalation

​Cost killswitch

​Partial batch failure handling

Build docs developers (and LLMs) love

Trigger & configuration

SQS message body — S3 event record

Execution flow

`OcrResult` schema

Gemini → Claude escalation

Cost killswitch

Partial batch failure handling