The core highlighting logic in RAG PDF Highlighter is fully independent of FastAPI. You can importDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/MuhammadSalmanAhmad/rag-pdf-highlighter/llms.txt
Use this file to discover all available pages before exploring further.
highlight_chunks_in_pdf directly into any Python script, notebook, or application and annotate PDFs in-process without running a web server. This is useful for batch pipelines, background workers, or any context where HTTP overhead is unnecessary.
Installation
Basic usage
Pass a local PDF path and a list ofDocument objects from langchain_core. Each document carries the text to locate (page_content) and the zero-indexed page number where it appears (metadata["page"]):
Return value
highlight_chunks_in_pdf returns a str — the absolute path to a newly created temporary file containing the annotated PDF. The file is written to the system’s default temp directory (e.g. /tmp) with a _highlighted.pdf suffix. The original file at pdf_path is never modified.
Working with multiple pages
Supply oneDocument per chunk, setting metadata["page"] to the correct zero-indexed page for each. Chunks on different pages are processed independently:
page value is out of range for the document are silently skipped. Chunks with an empty page_content after whitespace normalisation are also skipped.
Handling exceptions
highlight_chunks_in_pdf raises typed exceptions from rag_pdf_highlighter.exceptions so you can handle each failure mode precisely:
| Exception | Raised when |
|---|---|
NoDocumentsError | The documents list is empty |
PDFNotFoundError | No file exists at pdf_path |
HighlightError | Base class for all highlighting failures; catch as a fallback |
Cleanup
The output file is your responsibility to delete. Callcleanup_file from the same module when you are done with the highlighted PDF:
cleanup_file is a no-op if the file has already been removed, so it is safe to call unconditionally.
For a complete reference of all public functions and exceptions, see the Python Library API reference.