RAG PDF Highlighter is a Python package that locates text chunks inside PDF documents and returns an annotated copy with highlights applied. It ships as both a FastAPI microservice — ready to receive HTTP requests from any RAG pipeline — and a plain Python library you can call directly without starting a server. All inputs and outputs are compatible with LangChainDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/MuhammadSalmanAhmad/rag-pdf-highlighter/llms.txt
Use this file to discover all available pages before exploring further.
Document objects, so it slots naturally into existing retrieval workflows.
Key features
3-tier text matching
Finds chunks using exact match, then sentence-level fallback, then collapsed-whitespace matching for character-spaced PDF artifacts.
Async PDF download
Downloads remote PDFs with
httpx using non-blocking I/O, keeping the service responsive under concurrent load.Stateless
Temporary files are cleaned up after every request. No state accumulates between calls.
Docker-ready
A single
docker build and docker run command gets the service running in a container.Library-friendly
The core
highlight_chunks_in_pdf function raises plain Python exceptions. FastAPI is not required to use it.LangChain Document compatible
Accepts
langchain_core.documents.Document objects directly, with page_content and metadata.page fields.Installation
Install the package from PyPI:Two ways to use it
As a microservice: Start the Uvicorn server and sendPOST /highlight requests with a PDF URL and a list of document chunks. The service downloads the PDF, applies highlights, and streams back the annotated file. This mode is suitable for multi-language stacks or teams that want a standalone service boundary.
As a Python library: Import highlight_chunks_in_pdf directly and pass a local PDF path along with your Document list. No HTTP layer is involved. This mode is useful when your RAG pipeline is already written in Python and you want to avoid the overhead of a network hop.
Next steps
Quickstart
Run the service and send your first highlight request in three steps.
Guides
Learn how to deploy with Docker and integrate the library into your pipeline.