RAG PDF Highlighter lets you take the output of a retrieval-augmented generation (RAG) pipeline — a list of text chunks with page numbers — and produce a highlighted PDF showing exactly where each chunk appears in the original document. It works as a standalone FastAPI microservice or as a Python library you import directly into your code.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MuhammadSalmanAhmad/rag-pdf-highlighter/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Send your first highlight request in under five minutes.
Run as a microservice
Deploy the FastAPI service and call the REST API from any language.
Use as a Python library
Import
highlight_chunks_in_pdf directly into your RAG pipeline.API reference
Full endpoint and parameter documentation for the REST API.
How it works
Provide a PDF URL and text chunks
Send a
POST /highlight request with a publicly accessible PDF URL and a list of LangChain-style Document objects — each with page_content and a metadata.page field.The service locates each chunk
RAG PDF Highlighter tries three progressively looser search strategies — exact match, sentence-level match, and collapsed-whitespace match — to handle character-spaced PDFs and other real-world formatting quirks.
Key features
3-tier text matching
Exact → sentence → collapsed-whitespace fallback handles even character-spaced PDFs.
Async PDF download
Non-blocking I/O via
httpx keeps the service responsive under concurrent load.Stateless & clean
Temp files are cleaned up after every request — no disk accumulation.
LangChain compatible
Accepts standard
langchain_core.documents.Document objects out of the box.