Documentation Index
Fetch the complete documentation index at: https://mintlify.com/bnishit/purchase-ocr/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Invoice OCR system processes documents through a multi-stage pipeline: Upload → API Selection → OpenRouter Processing → Reconciliation → Display. This architecture supports both images and PDFs with multiple extraction modes.
Flow Diagram
┌──────────────┐
│ User │
│ Upload │
└──────┬───────┘
│
▼
┌─────────────────────────────────────────┐
│ ocr-uploader.tsx (Frontend) │
│ - File validation (image/PDF) │
│ - Base64 encoding │
│ - Mode selection (raw/structured) │
│ - Extractor selection (v4/compact) │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Route Selection │
│ ┌─────────────────────────────────┐ │
│ │ Raw Mode → /api/ocr │ │
│ │ Structured + v4 → /api/ocr- │ │
│ │ structured-v4 │ │
│ │ Structured + compact → /api/ocr-│ │
│ │ structured │ │
│ └─────────────────────────────────┘ │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ API Route Handler │
│ - Validate input │
│ - Build OpenRouter payload │
│ - Add system prompt + schema │
│ - Configure plugins (PDF) │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ OpenRouter API Call │
│ https://openrouter.ai/api/v1/ │
│ chat/completions │
│ │
│ Headers: │
│ - Authorization: Bearer <API_KEY> │
│ - HTTP-Referer: <SITE_URL> │
│ - X-Title: <APP_NAME> │
│ │
│ Body: │
│ - model (gemini-2.5-flash, gpt-4o...) │
│ - messages (system + user + file) │
│ - response_format: {type: json_object} │
│ - plugins (for PDF parsing) │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Response Processing │
│ - Coerce to valid JSON │
│ - Strip markdown code fences │
│ - Handle union types │
│ - Replace NaN/Infinity with null │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Reconciliation (if structured) │
│ - reconcileV4() for v4 schema │
│ - reconcile() for compact schema │
│ - Try multiple hypotheses │
│ - Pick best match │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Return to Frontend │
│ - Structured JSON with reconciliation │
│ - Or raw text (for raw mode) │
└──────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Display Components │
│ - invoice-viewer-v4.tsx (v4 schema) │
│ - invoice-viewer.tsx (compact schema) │
│ - Confetti on success │
│ - Show reconciliation status │
└─────────────────────────────────────────┘
Stage 1: Upload & Validation
Location: components/ocr-uploader.tsx:106-143
The uploader accepts files via:
- File input: Click to browse
- Drag & drop: Drop files directly onto the upload area
const handleFile = (f: File | null) => {
// ...
const pdf = f.type === "application/pdf" || f.name.toLowerCase().endsWith(".pdf");
setIsPdf(pdf);
const reader = new FileReader();
reader.onload = () => setPreview(reader.result as string);
reader.readAsDataURL(f); // Convert to base64 data URL
};
Supported formats:
- Images:
image/* (PNG, JPG, JPEG, WebP)
- Documents:
application/pdf
Mode Selection
Location: components/ocr-uploader.tsx:437-461
Users choose:
- Extraction Mode:
raw (plain text) or structured (JSON)
- Schema Format (if structured):
v4 (India GST) or compact (legacy)
- AI Model: Gemini 2.5 Flash (default), GPT-4o Mini, o3-mini, etc.
Stage 2: API Route Processing
Route: /api/ocr (Raw Text)
Location: app/api/ocr/route.ts:31-131
Purpose: Extract plain text from image/PDF
Request payload:
{
imageBase64?: string, // Data URL or base64
pdfUrl?: string, // Public URL
pdfBase64?: string, // Base64 PDF
filename?: string, // PDF filename
model?: string // Model override
}
Key logic (app/api/ocr/route.ts:61-103):
const response = await fetch("https://openrouter.ai/api/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${apiKey}`,
"HTTP-Referer": site,
"X-Title": title,
},
body: JSON.stringify({
model,
temperature: 0,
messages: [
{
role: "system",
content: "You are an OCR extractor. Return only the raw, verbatim text...",
},
{
role: "user",
content: [
{ type: "text", text: "Extract all text..." },
// Image or PDF file attachment
],
},
],
}),
});
Route: /api/ocr-structured-v4 (India GST Schema)
Location: app/api/ocr-structured-v4/route.ts:196-406
Purpose: Extract structured invoice data using v4 schema with reconciliation
Request payload (same as /api/ocr plus):
{
// ... standard fields
annotations?: unknown, // Pass-through for cached parsing
plugins?: unknown // Override PDF plugins
}
System prompt (app/api/ocr-structured-v4/route.ts:136-194):
- 2,600+ character prompt defining extraction rules
- Includes complete JSON schema (150 lines)
- Specifies decision rules for price mode, discounts, GST split
- Enforces 2-decimal precision and normalization
Response processing (app/api/ocr-structured-v4/route.ts:308-351):
const coerceToJson = (raw: unknown): unknown => {
// Strip markdown code fences
if (s.startsWith("```")) {
s = s.replace(/^```[a-zA-Z]*\n/, "").replace(/```\s*$/, "").trim();
}
// Extract first JSON object
const firstBrace = s.indexOf("{");
const lastBrace = s.lastIndexOf("}");
if (firstBrace !== -1 && lastBrace > firstBrace) {
s = s.slice(firstBrace, lastBrace + 1);
}
// Clean up union types, trailing commas, NaN/Infinity
s = s.replace(/"([^"]+)"\s*:\s*"([^"]+)"\s*\|\s*"([^"]+)"/g, '"$1": "$2"');
s = s.replace(/,\s*([}\]])/g, "$1");
s = s.replace(/\bNaN\b|\bInfinity\b|\b-?Infinity\b/g, "null");
return JSON.parse(s);
};
Reconciliation (app/api/ocr-structured-v4/route.ts:391-398):
try {
const doc = parsed as V4Doc;
const out = reconcileV4(doc); // Apply reconciliation engine
return NextResponse.json(out);
} catch {
return NextResponse.json(parsed); // Return raw if reconciliation fails
}
Route: /api/ocr-structured (Compact Schema)
Location: app/api/ocr-structured/route.ts:156-295
Purpose: Legacy schema with voucher, items, party structure
Schema (app/api/ocr-structured/route.ts:21-120):
{
voucher: {
invoice_number: string,
invoice_date: string,
invoice_discount: string,
invoice_discount_mode: "before_tax" | "after_tax" | "",
round_off: string,
total_invoice_amount: string,
additional_charges: [{ name, amount, tax_rate, amount_includes_tax }],
reconciliation: { status: "matched" | "unmatched" }
},
items: [{ price, unit, name, hsn_sac_code, quantity, tax_rate, discount_rate }],
party: { party_gstin_number, party_name, party_address, ... }
}
Stage 3: PDF Handling
Plugin Configuration
Location: app/api/ocr-structured-v4/route.ts:268-277
if (isPdf) {
const engine = process.env.OPENROUTER_PDF_ENGINE || "pdf-text";
const plugins: unknown = body.plugins || [
{
id: "file-parser",
pdf: { engine },
},
];
(payload as Record<string, unknown>).plugins = plugins as unknown;
}
Available engines (configured via .env.local):
pdf-text (default): Text extraction only
mistral-ocr: OCR for scanned PDFs
native: Use model’s native PDF support
Location: app/api/ocr-structured-v4/route.ts:238-251
messages: [
{ role: "system", content: SYSTEM_PROMPT },
{
role: "user",
content: [
{ type: "text", text: "Return ONLY JSON matching the provided schema." },
{
type: "file",
file: {
filename: body.filename || "invoice.pdf",
file_data: pdfData, // Data URL or public URL
},
},
],
},
]
Annotations (Caching)
Location: app/api/ocr-structured-v4/route.ts:256-265
To avoid re-parsing costs for the same PDF:
if (body.annotations) {
const msgs = payload.messages as Array<Record<string, unknown>>;
msgs.push({
role: "assistant",
content: "Previous file parse metadata",
annotations: body.annotations as unknown,
});
}
Stage 4: Frontend Display
Invoice Viewer V4
Location: components/invoice-viewer-v4.tsx:12-225
Key features:
- Reconciliation status badge (green = matched, red = error > 0.05)
- Document header (supplier, invoice number, date)
- Items table with computed columns
- Header discounts and charges breakdown
- Totals summary with printed vs computed comparison
- Alternates trace for debugging
Reconciliation check (components/invoice-viewer-v4.tsx:13-14):
const doc = React.useMemo(() => reconcileV4(data), [data]);
const matched = (doc.reconciliation?.error_absolute ?? 0) <= 0.05
&& (doc.printed?.grand_total ?? 0) > 0;
Success Feedback
Location: components/ocr-uploader.tsx:189-194
// Celebration effect on success
setShowConfetti(true);
setJustCompleted(true);
setTimeout(() => setShowConfetti(false), 100);
setTimeout(() => setJustCompleted(false), 3000);
Confetti animation + shimmer effect provide immediate visual feedback.
Error Handling
API Errors
OpenRouter failure (app/api/ocr-structured-v4/route.ts:291-296):
if (!response.ok) {
const err = await response.text();
return NextResponse.json(
{ error: `OpenRouter error: ${response.status} ${err}` },
{ status: 500 }
);
}
JSON parsing failure (app/api/ocr-structured-v4/route.ts:385-388):
return NextResponse.json(
{ error: `Model did not return valid JSON (${message})`, error_excerpt: excerpt },
{ status: 500 }
);
Frontend Error Display
Location: components/ocr-uploader.tsx:515-523
{error && (
<div className="flex items-start gap-2 p-4 bg-red-50 ... rounded-lg" role="alert">
<svg>...</svg>
<div className="text-sm text-red-800">{error}</div>
</div>
)}
Loading States
Ticking timer (components/ocr-uploader.tsx:43-57):
React.useEffect(() => {
let id: number | null = null;
if (loading) {
if (startRef.current == null) startRef.current = Date.now();
id = window.setInterval(() => {
if (startRef.current != null) setDurationMs(Date.now() - startRef.current);
}, 100);
}
// Updates every 100ms to show live progress
}, [loading]);
Memoization
Location: components/invoice-viewer-v4.tsx:13
const doc = React.useMemo(() => reconcileV4(data), [data]);
Reconciliation only runs when data changes, not on every render.
Environment Variables
Required:
OPENROUTER_API_KEY: Authentication for OpenRouter API
Optional:
OPENROUTER_MODEL: Default model (fallback: google/gemini-2.0-flash)
OPENROUTER_SITE_URL: Referer for OpenRouter (default: http://localhost:3000)
OPENROUTER_APP_NAME: App title (default: Invoice OCR)
OPENROUTER_PDF_ENGINE: PDF parsing engine (pdf-text | mistral-ocr | native)
Next Steps