TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vruizz22/innova-ai-engine/llms.txt
Use this file to discover all available pages before exploring further.
ocrWorker converts photos of handwritten student math work into structured LaTeX step sequences. It is triggered by S3 object-creation events forwarded through the ocr-queue, downloads the raw image bytes, strips EXIF metadata, and passes the image to OcrOrchestrator. The orchestrator first attempts extraction with Gemini 2.5 Flash; if the returned confidence falls below the configured threshold (default 0.7), it escalates automatically to Claude vision. The higher-confidence result is returned as an OcrResult containing a list of LaTeX steps, an overall confidence score, the provider that produced the result, an optional topic hint, and an estimated cost in USD. Results are logged and the processed count is returned to Lambda.
Trigger & configuration
Queue
SQS
ARN from env
ocr-queueARN from env
SQS_OCR_QUEUE_ARNLambda settings
Timeout: 60 s · Memory: 512 MB
Handler:
Handler:
src.pipeline.ocr_worker.handler| Setting | Value |
|---|---|
batchSize | 5 |
functionResponseType | ReportBatchItemFailures |
SQS message body — S3 event record
Each record is an S3-event notification delivered through SQS. The worker reads the bucket name and object key from the standard S3 event envelope:Name of the S3 bucket that received the image.
S3 object key of the uploaded handwritten image. Typically a JPEG or PNG of the student’s work.
Execution flow
Download image from S3
For each record, the worker calls
s3_client.get_object(Bucket=bucket_name, Key=key) and reads the response body as raw bytes. The bytes are then passed through strip_exif_and_validate to remove EXIF metadata and validate the image format before any model call.Extract LaTeX steps via OcrOrchestrator
OcrOrchestrator.extract(image_bytes, trace_id) is called. Internally the orchestrator runs the two-model pipeline (see Gemini → Claude escalation below) and returns an OcrResult.OcrResult schema
Ordered list of LaTeX strings representing each step of the student’s handwritten work. May be empty if the image is completely illegible.
Composite confidence score in
[0.0, 1.0] reflecting how reliably the model read the handwriting. Drives the Gemini → Claude escalation decision.Which model produced the final result:
"GEMINI" or "CLAUDE".Optional topic inferred from the content of the image (e.g.
"algebra"). May be null when the model cannot determine the topic.Estimated cost of the inference call(s) in US dollars, used for cost accounting.
Gemini → Claude escalation
OcrOrchestrator implements a two-stage extraction strategy to balance cost and quality:
OCR_CONFIDENCE_THRESHOLD defaults to 0.7 and is configurable via the OCR_CONFIDENCE_THRESHOLD environment variable. Lowering it reduces Claude escalations and cost; raising it increases Claude usage but improves LaTeX accuracy on ambiguous handwriting.Cost killswitch
The grader checks an SSM parameter before calling any model. The parameter name is configured viaSSM_OCR_PAUSED_PARAM (default /innova/ocr/paused). When its value is the string "true", a PausedError is raised and the message is returned to the queue for later retry.
Partial batch failure handling
The worker processes records in afor loop. An unhandled exception on a single record will propagate out of _main, causing Lambda to mark the entire batch for redelivery. Records are processed individually so a single bad image does not block all five.