Evidence Sanitizer Development and Contributing Guide

Evidence Sanitizer is a small, intentionally self-contained Python project. The full implementation lives in a single source module, the test suite runs with one command, and the CI pipeline keeps the bar high with type checking, linting, and formatting enforcement on every push and pull request. This page covers everything needed to set up a local development environment, run the checks, understand the project layout, and contribute new sanitization rules.

Project Structure

src/evidence_sanitizer/
  __init__.py      — package metadata and module entry point re-export
  __main__.py      — enables `python -m evidence_sanitizer` invocation
  cli.py           — Typer CLI application and command handlers
  sanitizer.py     — all sanitization logic, rules, and file I/O

tests/
  fixtures/golden/ — paired .input.txt / .expected.txt fixture files
  test_*.py        — individual rule unit tests and golden integration tests

pyproject.toml     — project metadata, dependencies, and tool configuration
.github/workflows/
  ci.yml           — GitHub Actions CI pipeline

Evidence Sanitizer is intentionally architected as a single module. There is no rules/ package, no plugin system, no configuration files, and no import-time side effects. All sanitization logic — constants, rule finders, the replacement engine, and file I/O — lives in sanitizer.py. This keeps the codebase auditable: a reviewer can read one file to understand the complete behavior of the tool.

Key source files

sanitizer.py

The heart of the project. Contains all rule ID constants, redaction marker constants, approved name sets, compiled regex patterns, every find_* function, apply_findings(), and sanitize_text(). Also contains file I/O helpers (validate_paths, read_input_file, sanitize_file) and the SanitizationReport and Finding dataclasses.

cli.py

A thin Typer application. Defines the sanitize command, delegates to sanitize_file() in sanitizer.py, renders the rule report to stdout, and maps SafeError exit codes to the correct process exit status. No sanitization logic lives here.

tests/fixtures/golden/

Nine paired fixture files providing end-to-end regression coverage. Each .input.txt / .expected.txt pair is tested by the parameterized test_golden_fixture function in test_golden_fixtures.py, including an idempotence assertion on every fixture.

pyproject.toml

Single source of truth for the project name, version, Python requirement, runtime and dev dependencies, build backend, and tool configuration for pytest, ruff, and mypy.

Development Commands

This project uses uv for environment and dependency management. All commands are run through uv run so they use the project’s pinned virtual environment automatically.

Install all dependencies

Clone the repository, then install runtime and dev dependencies into a local virtual environment:

uv sync

This installs typer (runtime), plus mypy, pytest, and ruff (dev). No published package install or separate pip install step is needed.

Run the test suite

uv run pytest

Runs all tests under tests/ with --strict-config --strict-markers enforced (configured in pyproject.toml). To run only the golden fixture tests:

uv run pytest tests/test_golden_fixtures.py

Lint with ruff

uv run ruff check .

Checks for pyflakes errors (F), pycodestyle issues (E), isort import order (I), pyupgrade modernisation hints (UP), and flake8-bugbear issues (B). The target is Python 3.12 with a line length of 88.

Check formatting

uv run ruff format --check .

Verifies that all files match ruff’s formatter output (double quotes, space indentation, LF line endings). Run uv run ruff format . without --check to apply formatting in place.

Type-check with mypy

uv run mypy src tests

Runs mypy in strict mode (strict = true in pyproject.toml) targeting Python 3.12. All public and private functions must have complete type annotations. mypy_path = "src" is set so the evidence_sanitizer package is resolved from source.

Check for whitespace issues

git diff --check

Ensures no trailing whitespace or mixed line endings have been introduced. Run this before opening a pull request.

Dependencies

Dependencies are declared in pyproject.toml. The project requires Python 3.12 or later.

Runtime

Package	Version constraint	Purpose
`typer`	`>=0.15.0,<1.0.0`	CLI framework — argument parsing, help text, exit codes

Development

Package	Version constraint	Purpose
`mypy`	`>=1.14.0,<2.0.0`	Static type checking in strict mode
`pytest`	`>=8.3.0,<9.0.0`	Test runner
`ruff`	`>=0.8.0,<1.0.0`	Linter and formatter

Build

Package	Version constraint	Purpose
`hatchling`	`>=1.26.0,<2.0.0`	Build backend (PEP 517); wheel packages `src/evidence_sanitizer`

Continuous Integration

The GitHub Actions CI pipeline is defined in .github/workflows/ci.yml. It runs on every push and every pull request against all branches.

name: CI

on:
  push:
  pull_request:

permissions:
  contents: read

env:
  UV_PYTHON: "3.12"

jobs:
  checks:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true

      - name: Set up Python
        run: uv python install 3.12

      - name: Install dependencies
        run: uv sync

      - name: Run pytest
        run: uv run pytest

      - name: Run ruff
        run: uv run ruff check .

      - name: Check formatting
        run: uv run ruff format --check .

      - name: Run mypy
        run: uv run mypy src tests

All four checks — pytest, ruff check, ruff format --check, and mypy — must pass for the CI job to succeed. The pipeline runs on ubuntu-latest with a 10-minute timeout and uses uv’s built-in caching via astral-sh/setup-uv@v5 to keep runs fast. The git diff --check step is not included in CI; it is a local pre-commit discipline.

Adding New Rules

All rule logic lives in sanitizer.py. The module is structured in labelled sections (marked with # --- ... --- comments) that follow a consistent pattern. To add a new rule:

Add constants

Add a RULE_ID_* string constant in the Rule ID constants section and a REDACTION_MARKER_* string constant in the Marker constants and approved-marker sets section. Add the new marker to the relevant approved-marker frozenset.

Add sensitive names (if applicable)

If the rule matches by name (header names, query parameter names, JSON field names, form field names), add the names to the appropriate set in the Header/query/JSON/form sensitive name sets section.

Write a private finder function

Add a _find_* function (or a public find_* function if it needs to be importable by tests) that scans text and returns a tuple[Finding, ...]. Each Finding records the start and end byte offsets of the value to replace, the replacement string, and the rule_id.Use _overlaps_existing_finding() to skip any span that overlaps a finding already registered by a higher-priority rule. Pass the accumulated existing_findings sequence as the existing argument.

Call the finder in sanitize_text()

Register the finder in sanitize_text() at the appropriate priority position. Earlier finders’ results are passed as existing_findings to later finders so that overlap protection works correctly. Findings are accumulated into the final findings tuple and passed to apply_findings().

Write tests and golden fixtures

Add unit tests in a test_*.py file that exercise the new finder directly. Add or update a golden fixture in tests/fixtures/golden/ and register its expected rule counts in EXPECTED_COUNTS in test_golden_fixtures.py. Also add any new synthetic secret values to RAW_SECRET_VALUES so the no-leak assertion covers them.

Overlap protection is not automatic — each finder must receive the correct existing_findings sequence to avoid producing duplicate redactions on the same span. When in doubt, pass all findings accumulated so far. The apply_findings() engine processes findings in offset order and will panic if two findings overlap at apply time.

The _find_folded_proxy_authorization_spans() helper returns spans for multi-line folded Proxy-Authorization headers. These spans are passed to query, JSON, and form finders so that secrets nested inside folded proxy headers are covered by the proxy_authorization.* rules rather than triggering lower-priority rules. If your new rule scans line content that could be embedded inside proxy headers, pass folded_proxy_spans to it and use _overlaps_existing_finding() to skip those spans.

License

Evidence Sanitizer is released under the MIT License. See LICENSE in the repository root.

Get Started

Using the CLI

Sanitization Rules

Concepts

Reference

Evidence Sanitizer Development and Contributing Guide

Project Structure

Key source files

sanitizer.py

cli.py

tests/fixtures/golden/

pyproject.toml

Development Commands

Dependencies

Runtime

Development

Build

Continuous Integration

Adding New Rules

License

Build docs developers (and LLMs) love

Get Started

Using the CLI

Sanitization Rules

Concepts

Reference

Documentation Index

​Project Structure

​Key source files

sanitizer.py

cli.py

tests/fixtures/golden/

pyproject.toml

​Development Commands

​Dependencies

​Runtime

​Development

​Build

​Continuous Integration

​Adding New Rules

​License

Build docs developers (and LLMs) love

Project Structure

Key source files

Development Commands

Dependencies

Runtime

Development

Build

Continuous Integration

Adding New Rules

License