Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/DilwoarH/pdf-visual-regression/llms.txt

Use this file to discover all available pages before exploring further.

Basic usage

The PDF visual regression tester is invoked from the command line using the pdf_visual_diff.py script. At minimum, you need to provide two PDF files to compare.

Command syntax

python pdf_visual_diff.py <path/to/pdf1.pdf> <path/to/pdf2.pdf> [options]

Arguments

  • pdf1: Path to the first PDF file (typically the reference or expected version)
  • pdf2: Path to the second PDF file (typically the new or actual version)
  • --output: (Optional) Directory where diff images will be saved (default: diff_output)
  • --threshold: (Optional) Similarity threshold for SSIM comparison, from 0.0 to 1.0 (default: 1.0)
A threshold of 1.0 means pages must be identical. Lower values like 0.999 allow for minor rendering variations.

Your first comparison

Let’s run a simple comparison using the example PDFs included in the repository.
1

Navigate to the project directory

Make sure you’re in the project root and your virtual environment is activated:
cd pdf-visual-regression
source venv/bin/activate  # On Windows: venv\Scripts\activate
2

Run the comparison

Compare two example PDF files:
python pdf_visual_diff.py example-pdfs/example-working-gov-letter.pdf example-pdfs/example-broken-gov-letter.pdf
This command compares a working government letter template against a broken version.
3

Review the output

If differences are found, you’ll see console output like:
Visual differences found on pages: 1
Diff images saved to: /path/to/pdf-visual-regression/diff_output/20261202_171728_diff/
The output directory contains:
  • diff_page_1.png: Highlighted image showing differences
  • results.json: Detailed comparison metadata

Understanding the results

When differences are found

The tool generates annotated images with red highlights marking the exact locations of differences:
  • Diff images: Named diff_page_N.png where N is the page number
  • Console output: Lists all pages with differences
  • JSON report: Contains structured data about the comparison

When PDFs are identical

If no differences are detected, you’ll see:
All pages are visually identical.
No diff images are generated, and the output directory will only contain results.json with "status": "success".
The output directory uses timestamps (e.g., 20261202_171728_diff) to preserve comparison history and prevent overwriting previous results.

Custom output directory

Specify a custom directory for saving diff images:
python pdf_visual_diff.py document_v1.pdf document_v2.pdf --output my_results
This creates timestamped subdirectories inside my_results/ (e.g., my_results/20261202_171728_diff/).

Adjusting sensitivity

The --threshold parameter controls comparison sensitivity. The default value of 1.0 requires perfect matches.

Example: Allow minor rendering differences

python pdf_visual_diff.py reference.pdf generated.pdf --threshold 0.999
Lower threshold values are useful when:
  • Comparing PDFs generated on different systems
  • Minor font rendering variations are acceptable
  • Anti-aliasing differences should be ignored
Setting the threshold too low (e.g., below 0.95) may cause the tool to miss significant visual differences. Start with 0.999 and adjust as needed.

Working with the JSON output

Every comparison generates a results.json file with detailed metadata:
{
  "timestamp": "20261202_171728",
  "status": "error",
  "description": "Visual differences found on pages: 1",
  "pdf1": "/absolute/path/to/pdf1.pdf",
  "pdf2": "/absolute/path/to/pdf2.pdf",
  "pdf1_pages": 1,
  "pdf2_pages": 1,
  "threshold": 1.0,
  "identical": false,
  "diff_pages": [1],
  "extra_pages": [],
  "extra_pages_in": null
}
Key fields:
  • status: "success" if identical, "error" if differences found
  • identical: Boolean indicating if PDFs are visually the same
  • diff_pages: Array of page numbers with differences
  • extra_pages: Pages that exist in only one PDF
  • extra_pages_in: Which PDF has extra pages (“PDF1” or “PDF2”)
Parse this JSON file in your CI/CD pipeline to automatically fail builds when visual regressions are detected.

Handling page count mismatches

When PDFs have different page counts, the tool compares up to the shorter document’s length:
python pdf_visual_diff.py short_doc.pdf long_doc.pdf
Output:
Warning: PDFs have different page counts. PDF1: 3 pages, PDF2: 5 pages.
Comparing up to the lower page count.
Extra pages only in PDF2: 4, 5
Diff images saved to: diff_output/20261202_171728_diff/
Extra pages are saved as separate images:
  • extra_page_4_only_in_pdf2.png
  • extra_page_5_only_in_pdf2.png

Running tests

Verify your installation by running the included test suite:
make test
This command:
  1. Sets up test PDFs using the create_test_pdfs.py script
  2. Runs unit tests from tests/test_diff_script.py
  3. Validates that the comparison logic works correctly
Remove all generated test files and outputs:
make clean
This removes:
  • tests/test_output/ - Test comparison results
  • tests/test_pdfs/ - Generated test PDFs
  • diff_output/ - Any diff output from manual tests
  • __pycache__/ - Python cache files

Integration with CI/CD

Integrate the tool into your continuous integration pipeline:
#!/bin/bash
# Example CI script

python pdf_visual_diff.py expected_output.pdf generated_output.pdf --output test_results

# Check exit code or parse JSON
if grep -q '"status": "error"' test_results/*/results.json; then
  echo "Visual regression detected!"
  exit 1
fi

echo "PDFs are visually identical"
exit 0
The tool prints results to stdout and generates JSON for programmatic access, making it easy to integrate with any CI/CD system.

Next steps

Now that you’ve run your first comparison, explore:
  • Integrate the tool into your testing workflow
  • Set up automated comparisons in your CI/CD pipeline
  • Adjust threshold values for your specific use case
  • Parse JSON output for custom reporting
For questions or issues, refer to the project repository or file an issue on GitHub.

Build docs developers (and LLMs) love