Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MuhammadSalmanAhmad/rag-pdf-highlighter/llms.txt

Use this file to discover all available pages before exploring further.

RAG PDF Highlighter lets you take the output of a retrieval-augmented generation (RAG) pipeline — a list of text chunks with page numbers — and produce a highlighted PDF showing exactly where each chunk appears in the original document. It works as a standalone FastAPI microservice or as a Python library you import directly into your code.

Quickstart

Send your first highlight request in under five minutes.

Run as a microservice

Deploy the FastAPI service and call the REST API from any language.

Use as a Python library

Import highlight_chunks_in_pdf directly into your RAG pipeline.

API reference

Full endpoint and parameter documentation for the REST API.

How it works

1

Provide a PDF URL and text chunks

Send a POST /highlight request with a publicly accessible PDF URL and a list of LangChain-style Document objects — each with page_content and a metadata.page field.
2

The service locates each chunk

RAG PDF Highlighter tries three progressively looser search strategies — exact match, sentence-level match, and collapsed-whitespace match — to handle character-spaced PDFs and other real-world formatting quirks.
3

Receive the annotated PDF

The service streams back a highlighted PDF (application/pdf) with every matched region annotated. Temporary files are deleted automatically after the response.

Key features

3-tier text matching

Exact → sentence → collapsed-whitespace fallback handles even character-spaced PDFs.

Async PDF download

Non-blocking I/O via httpx keeps the service responsive under concurrent load.

Stateless & clean

Temp files are cleaned up after every request — no disk accumulation.

LangChain compatible

Accepts standard langchain_core.documents.Document objects out of the box.

Installation

pip install rag-pdf-highlighter
To run the full FastAPI microservice, start Uvicorn after installing: uvicorn rag_pdf_highlighter.main:app --host 0.0.0.0 --port 8000

Build docs developers (and LLMs) love