highlight_chunks_in_pdf — Python library reference

rag_pdf_highlighter.utils.pdf_helpers exposes three public functions. These can be called directly in any Python application without running the FastAPI server.

`highlight_chunks_in_pdf`

highlight_chunks_in_pdf(pdf_path: str, documents: list[Document]) -> str

Opens the PDF at pdf_path, applies highlights to every chunk in documents, saves the annotated copy to a new temp file, and returns its path.

pdf_path

string

required

Absolute or relative path to the local PDF file.

documents

list[Document]

required

List of langchain_core.documents.Document objects. Each must have page_content (string) and metadata["page"] (zero-indexed integer).

Returns: Path to a temp file containing the highlighted PDF. Raises:

NoDocumentsError if documents is empty
PDFNotFoundError if pdf_path does not exist
HighlightError as base for all other errors

from langchain_core.documents import Document
from rag_pdf_highlighter.utils.pdf_helpers import highlight_chunks_in_pdf

documents = [
    Document(page_content="Text to find", metadata={"page": 0}),
]

output_path = highlight_chunks_in_pdf(
    pdf_path="./report.pdf",
    documents=documents,
)
print(f"Highlighted PDF saved to: {output_path}")

The output file is written to a system temp directory. Call cleanup_file(output_path) when done.

`download_pdf`

async def download_pdf(url: str) -> str

Downloads a PDF from the given URL asynchronously using httpx and writes it to a temp file. Returns the path to the temp file.

url

string

required

URL of the PDF to download. HTTP 60s timeout.

Returns: Path to the downloaded temp file. Raises:

PDFDownloadError if the download fails (network error, non-2xx response, etc.)

This is an async function — call it with await inside an async context.

import asyncio
from rag_pdf_highlighter.utils.pdf_helpers import download_pdf, cleanup_file

async def main():
    path = await download_pdf("https://example.com/report.pdf")
    try:
        # process path...
        pass
    finally:
        cleanup_file(path)

asyncio.run(main())

`cleanup_file`

cleanup_file(path: str) -> None

Silently deletes the file at path if it exists. Does nothing if the file is already gone.

path

string

required

Absolute path to the file to delete.

from rag_pdf_highlighter.utils.pdf_helpers import cleanup_file

cleanup_file("/tmp/highlighted_abc123.pdf")

Endpoints

Python Library

highlight_chunks_in_pdf — Python library reference