Documentation Index
Fetch the complete documentation index at: https://mintlify.com/DilwoarH/pdf-visual-regression/llms.txt
Use this file to discover all available pages before exploring further.
Getting started
We welcome contributions to improve the PDF visual diff tool! Whether you’re fixing bugs, adding features, or improving documentation, your help is appreciated.Prerequisites
Before contributing, ensure you have:- Python 3.7 or higher
- Git for version control
- A text editor or IDE
- Basic understanding of Python and PDF processing
Initial setup
Install dependencies
Install the required Python packages:Dependencies from
requirements.txt:PyMuPDF- PDF renderingscikit-image- SSIM algorithmPillow- Image processingnumpy- Numerical operationsreportlab- Test PDF generation
Codebase structure
The project follows a simple, focused structure:Core modules
pdf_visual_diff.py
pdf_visual_diff.py
The main entry point containing all comparison logic.Key functions:
compare_pdfs(pdf1_path, pdf2_path, output_dir, threshold)- Core comparison function (lines 10-136)main()- CLI argument parsing and script entry (lines 137-148)
- PDF loading and validation (lines 19-30)
- Page rendering loop (lines 34-69)
- Extra page handling (lines 72-89)
- Results generation (lines 97-126)
tests/test_diff_script.py
tests/test_diff_script.py
Integration tests using subprocess to test the CLI.Test class:
TestPdfVisualDiff- Main test suite with three test methods
test_identical_pdfs()- Verifies identical PDFs passtest_different_text_pdfs()- Checks diff detectiontest_different_page_count_pdfs()- Tests page count handling
tests/create_test_pdfs.py
tests/create_test_pdfs.py
Test fixture generator using ReportLab.Functions:
create_test_pdf(filename, text_content)- Creates a simple one-page PDFsetup_test_files()- Generates all test fixtures
Makefile
Makefile
Build automation with common development tasks.Targets:
make install- Install dependenciesmake test- Run test suitemake setup- Generate test PDFsmake clean- Remove generated files
Development workflow
Making changes
Create a feature branch
Always work on a separate branch:Use descriptive branch names:
feature/add-threshold-auto-detectfix/memory-leak-large-pdfsdocs/improve-readme
Make your changes
Edit the relevant files. Common areas:
- Core logic: Modify
pdf_visual_diff.py - Tests: Add/update
tests/test_diff_script.py - Test fixtures: Update
tests/create_test_pdfs.py - Dependencies: Update
requirements.txtif needed
Commit your changes
Write clear, descriptive commit messages:Good commit messages explain why, not just what.
Code style guidelines
Follow these conventions to maintain consistency:Python style
- Follow PEP 8 style guide
- Use 4 spaces for indentation (no tabs)
- Maximum line length: 100 characters
- Use descriptive variable names
- Add docstrings to all functions
Testing conventions
- Write tests for all new features
- Use descriptive test method names
- Include docstrings explaining what each test verifies
- Follow the Arrange-Act-Assert pattern
Common contribution areas
Feature additions
Potential features to implement:- Multi-format support: Export diffs as PDF, HTML reports
- Threshold auto-tuning: Automatically determine optimal threshold
- Batch comparison: Compare multiple PDF pairs
- Ignore regions: Mask specific areas from comparison
- Performance optimization: Parallel page processing
- CI/CD integration: GitHub Actions workflow examples
Bug fixes
When fixing bugs:- Create a test that reproduces the bug
- Verify the test fails before your fix
- Implement the fix
- Verify the test passes
- Check that existing tests still pass
Documentation improvements
- Improve code comments
- Add usage examples to README
- Create troubleshooting guides
- Document edge cases
Reviewing code
When reviewing contributions, check for:- Correctness: Does it solve the stated problem?
- Tests: Are there tests covering the new code?
- Style: Does it follow project conventions?
- Performance: Are there any obvious bottlenecks?
- Documentation: Are changes documented?
Release process
For maintainers releasing new versions:Getting help
If you need assistance:- Issues: Open a GitHub issue for bugs or feature requests
- Discussions: Use GitHub Discussions for questions
- Code review: Tag maintainers in your PR for review
Before opening an issue, search existing issues to avoid duplicates. Provide:
- Clear description of the problem
- Steps to reproduce
- Expected vs actual behavior
- System information (OS, Python version)
- Sample PDFs if applicable (without sensitive data)
Code of conduct
We expect all contributors to:- Be respectful and constructive
- Welcome newcomers and help them get started
- Focus on what’s best for the project and community
- Accept constructive criticism gracefully
- Show empathy towards other community members