Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MuhammadSalmanAhmad/rag-pdf-highlighter/llms.txt

Use this file to discover all available pages before exploring further.

This is the primary endpoint of RAG PDF Highlighter. It downloads the PDF from the provided URL, highlights all matching text chunks using a 3-tier search strategy (exact match → sentence match → collapsed-whitespace match), and returns the annotated document as a binary PDF.

Request

Method and path: POST /highlight Content-Type: application/json
pdf_url
string
required
Publicly accessible URL of the PDF to download and annotate. The service fetches this URL at request time using an async HTTP client with a 60-second timeout.
documents
object[]
required
List of document objects identifying the text chunks to highlight. At least one entry is required.
The 3-tier matching strategy tries exact search first, then sentence-level search, then a collapsed-whitespace search that handles PDFs with individually spaced characters (e.g. "W H A T" stored as "WHAT" in the text layer). If none of the three strategies find a match for a given chunk, that chunk is silently skipped and no highlight is added.

Response

A successful request returns HTTP 200 with the annotated PDF as binary content.
HeaderValue
Content-Typeapplication/pdf
Content-Dispositionattachment; filename="highlighted.pdf"
body
binary
The raw binary content of the highlighted PDF. Save it directly to a .pdf file — do not attempt to parse it as JSON.
Temporary files (the downloaded original and the annotated output) are removed from the server automatically after the response is sent via background tasks.

Error responses

All errors return a JSON body with a single detail key:
{"detail": "error message here"}
StatusCauseExample detail
400 Bad RequestPDF could not be downloaded from the provided URL"Failed to download PDF: connection refused"
400 Bad RequestEmpty document list passed to the highlighter"No documents provided"
422 Unprocessable EntityMissing pdf_url or documents in the request body(FastAPI validation error)
500 Internal Server ErrorUnexpected failure during the highlighting process"Highlighting failed: <reason>"
A 400 is returned for both download failures and empty document lists. Check the detail message to distinguish the two cases.

Example

curl -X POST http://localhost:8000/highlight \
  -H "Content-Type: application/json" \
  -d '{
    "pdf_url": "https://example.com/report.pdf",
    "documents": [
      {
        "page_content": "Text to highlight in the PDF",
        "metadata": {"page": 0}
      }
    ]
  }' \
  --output highlighted.pdf

Health check: GET /

A simple liveness probe. No request body or authentication is required.
curl http://localhost:8000/
Returns HTTP 200 with:
{"status": "ok the app is running"}

Build docs developers (and LLMs) love