AI Tender Analysis: Extract Requirements from PDFs

TenderCheck AI’s tender analysis feature reads a public tender PDF, reasons over its contents as a Legal & Technical Auditor, and returns a structured list of Requirement objects — each with a type, confidence score, source page number, and a verbatim snippet from the original document. The result is a machine-readable compliance checklist that you can immediately validate a vendor proposal against.

What Gets Extracted

Every requirement extracted from a tender document is represented by the Requirement entity. Each field is populated by the AI before being persisted in the database.

export type RequirementType = "MANDATORY" | "OPTIONAL" | "UNKNOWN";

export interface Requirement {
  id: string;
  text: string;                    // The full, exact requirement statement
  normalizedText?: string;         // Cleaned up text for processing
  type: RequirementType;
  source: {
    pageNumber: number;            // Absolute page number in the original PDF
    paragraph?: number;
    snippet: string;               // Verbatim 1–2 sentence fragment from the document
  };
  keywords: string[];              // 3–4 keywords generated for vector search
  confidence: number;              // 0.0 to 1.0 — how certain the AI is this is a real requirement
}

Field	Description
`text`	The complete, exact statement of the requirement as it appears in the tender
`type`	Gemini classifies each requirement as `TECHNICAL`, `ADMINISTRATIVE`, `LEGAL`, or `FINANCIAL`. The domain `RequirementType` (`MANDATORY` \| `OPTIONAL` \| `UNKNOWN`) is also available for display-layer mapping
`source.pageNumber`	Derived from `--- PAGE X ---` markers embedded by the PDF parser
`source.snippet`	A 1–2 sentence verbatim quote used to power the Citation Preview
`keywords`	Short terms the AI generates to enable semantic vector search during proposal validation
`confidence`	`1.0` for unambiguous mandates, `0.5` for desirable or conditional clauses

AI Extraction Pipeline

The extraction engine uses Google Gemini 2.5 Flash via the Genkit framework. Gemini is prompted to adopt the Legal & Technical Auditor persona with strict instructions:

Focus on imperatives. The prompt instructs the model to look specifically for phrases like "deberá", "será obligatorio", "se requiere", "es indispensable", "must", and "shall" — the classic markers of exclusionary clauses in public procurement documents.
Ignore filler text. Introductory paragraphs, general descriptions, and non-binding commentary are explicitly excluded from extraction.
Output in Spanish. All extracted requirement text and reasoning is returned in Spanish regardless of the source language, consistent with the platform’s target market.
Assign confidence deterministically. A clear mandate ("deberá") receives confidence 1.0; a desirable criterion receives 0.5.

The AI also classifies each requirement into one of four internal types used during analysis — TECHNICAL, ADMINISTRATIVE, LEGAL, or FINANCIAL — which the frontend maps to the broader MANDATORY / OPTIONAL / UNKNOWN display types.

All Gemini calls are traced end-to-end using LangSmith via the traceable SDK wrapper. You can inspect prompt inputs, outputs, latency, and token usage in the LangSmith dashboard if you have observability configured.

Large PDF Support

For tender documents longer than 15 pages (LARGE_PDF_THRESHOLD), TenderCheck AI automatically switches from single-pass analysis to a parallelised chunked pipeline.

Constant	Value	Purpose
`LARGE_PDF_THRESHOLD`	`15` pages	Minimum page count to trigger chunked processing
`PAGES_PER_CHUNK`	`10` pages	Pages included in each chunk sent to Gemini
`CHUNK_MAX_CHARS`	`500,000` chars	Hard character cap per chunk (matches Gemini 2.5 Flash’s context window)
`CHUNK_PARALLEL_PROCESSING`	`3`	Maximum number of chunks processed concurrently

The pipeline works as follows:

PDF (e.g. 45 pages)
│
├─ Chunk 0 → Pages  1–10  ──┐
├─ Chunk 1 → Pages 11–20  ──┤
├─ Chunk 2 → Pages 21–30  ──┼─ All chunks dispatched simultaneously via Promise.all
├─ Chunk 3 → Pages 31–40  ──┤    (Gemini 2.5 Flash handles each concurrently)
└─ Chunk 4 → Pages 41–45  ──┘
│
└─ Requirements aggregated across all chunks
   → Embeddings generated
   → Saved to Turso DB

analyzeChunks dispatches all chunks to Gemini simultaneously using Promise.all. The CHUNK_PARALLEL_PROCESSING constant (3) is available in the codebase for future windowed-concurrency tuning but the current implementation sends every chunk in one parallel wave. Each chunk preserves absolute page numbers (--- PAGE X --- markers) so that source.pageNumber on every extracted requirement refers to the page in the original full document, not the chunk. If a single chunk fails (e.g., due to a transient AI error), the pipeline logs a warning and continues processing the remaining chunks — the analysis will complete as long as at least one chunk yields requirements.

Supported File Format

Only PDF files are accepted. The PDF must contain selectable, machine-readable text. Scanned image PDFs without an embedded text layer cannot be processed.

Constraint	Value
Accepted format	PDF (`.pdf`) only
Maximum file size	50 MB (`FILE_UPLOAD_LIMIT_MB`)
Text requirement	Must have a selectable text layer (not a scanned image)

The upload component enforces the application/pdf MIME type client-side before the file reaches the backend.

How to Analyze a Tender

Sign in with your email and password, or use Google Sign-In (OAuth 2.0 with PKCE). You must be authenticated to submit an analysis.

Open the dashboard and click Analyze Tender

From the main dashboard, click the Analyze Tender upload zone (labeled Pliego). You can click to open a file picker or drag and drop a PDF directly onto the zone.

Select your tender PDF

Choose a PDF up to 50 MB with a machine-readable text layer. Once selected, the upload zone shows a ✅ confirmation with the filename.

Optionally select an industry

Use the industry dropdown to scope the analysis to a specific sector (e.g., Digital Services, Construction). This enables the Industry Scope Validation rule. The field is optional — if omitted, Digital Services is used as the default.

Submit and wait for results

Click Analyze. The backend parses the PDF, runs the Gemini extraction pipeline (chunked if > 15 pages), generates vector embeddings for each requirement, and saves the analysis. Processing time depends on document length.

Review extracted requirements with page citations

Each requirement card shows its full text, type badge, confidence, keywords, and a clickable page badge (e.g., Pág. 3). Clicking the page badge opens the Citation Preview modal, which displays the full page text with the AI-identified fragment highlighted.

Use the sample tender document at docs/Testing_docs/Pliego_Tender_IT_Security.pdf (included in the repository) to explore the full analysis flow without needing a real tender document. It is an IT security services tender designed to exercise all requirement types.

Analysis Status

Every TenderAnalysis record moves through a defined lifecycle tracked by AnalysisStatus.

export type AnalysisStatus = "PENDING" | "PROCESSING" | "COMPLETED" | "FAILED";

Status	Meaning
`PENDING`	Analysis record created but processing has not started
`PROCESSING`	PDF parsing and AI extraction are in progress
`COMPLETED`	All requirements extracted and embeddings saved successfully
`FAILED`	A critical error occurred (e.g., unreadable PDF, AI quota exceeded) — the record is saved with an empty requirements list

The current status is displayed as a badge on the analysis header card in the dashboard. A FAILED analysis can be retried by uploading the document again.

Get Started

Core Features

Architecture

Configuration & Deployment

AI Tender Analysis: Extract Requirements from PDFs

What Gets Extracted

AI Extraction Pipeline

Large PDF Support

Supported File Format

How to Analyze a Tender

Analysis Status

Build docs developers (and LLMs) love

Get Started

Core Features

Architecture

Configuration & Deployment

Documentation Index

​What Gets Extracted

​AI Extraction Pipeline

​Large PDF Support

​Supported File Format

​How to Analyze a Tender

​Analysis Status

Build docs developers (and LLMs) love

What Gets Extracted

AI Extraction Pipeline

Large PDF Support

Supported File Format

How to Analyze a Tender

Analysis Status