Evidence Sanitizer Safety Model: Guarantees Explained

Evidence Sanitizer is designed so that every operation either succeeds safely or fails cleanly — it never silently modifies your original evidence, never overwrites existing files, and never leaks detected credential values into reports or terminal output. The guarantees below are enforced by the implementation and apply to every invocation of evidence-sanitizer sanitize.

Safety Guarantees

1. Source file is never modified in place

The input file is opened for reading only. The tool does not use in-place editing, backup-overwrite behavior, or write handles on the input path. Your original evidence file is always left byte-for-byte unchanged.

2. Output is written to a separate, explicitly provided path

Sanitized content is written only to the path supplied via --output OUTPUT. There is no implicit default output path and no silent creation of sibling files.

3. Existing output files are not overwritten

The output path must not exist before the tool runs. The implementation uses exclusive file creation (xb mode) so that a destination file that appears between validation and writing is never overwritten. If the output file already exists, the tool exits with a path-safety error rather than clobbering it.

4. Output parent directory must already exist

The tool does not create missing parent directories. If the directory containing the requested output path does not exist, the tool exits with an error. You must create the destination directory yourself before running sanitize.

5. `--dry-run` writes no output file and no temporary files

Dry-run mode performs all validation, reading, decoding, and detection steps and reports rule counts — but it creates no output file and no temporary files at any point. It is safe to run on files where you want to preview findings before committing to an output path.

6. Reports include only fixed rule IDs and counts

The CLI summary and SanitizationReport data structure contain only stable rule identifiers (such as authorization.bearer or cookie.value) and integer counts of how many replacements each rule made. Reports never include source line excerpts, replacement previews, parameter names, header names, cookie names, or any custom scheme names.

7. Detected raw values are never included in reports or CLI output

The Finding data structure stores only the offsets and the deterministic replacement string — it never stores the matched credential value. No detected value can surface in terminal output, exception messages, tracebacks, or report output.

8. Redaction markers are deterministic and idempotent

Each rule family maps to exactly one fixed marker string (for example, <REDACTED:authorization.bearer>). Running sanitize twice on the same input always produces byte-identical output. See the Idempotence and Markers pages for the full per-rule marker table and idempotence policy.

9. Processing is entirely local

The tool performs no network calls, sends no telemetry, uses no LLM or AI detection, loads no plugins, reads no configuration files, and maintains no persistent state between runs. Evidence never leaves your machine as part of sanitization.

10. Input must be strict UTF-8 or UTF-8 with BOM

Only strict UTF-8 and UTF-8 with BOM are accepted. Inputs that fail strict UTF-8 decoding are rejected with an error before any sanitization occurs.

11. Maximum input size is 10 MiB

Files larger than 10 MiB are rejected before reading begins. The limit is checked against both the on-disk file size and the in-memory byte length after reading. The constant MAX_INPUT_BYTES = 10 * 1024 * 1024 is enforced in read_input_file.

12. NUL bytes are rejected

Inputs containing NUL bytes (\x00) are rejected immediately after reading and before any decoding or sanitization. This acts as a minimal binary-file guard.

13. UTF-8 BOM, newline style, and final-newline state are preserved

The tool reads and writes raw bytes without newline normalization. LF-only, CRLF, and mixed newline sequences are preserved exactly. If the input carried a UTF-8 BOM, the output carries one too. If the input had no trailing newline, neither does the output.

What the Tool Does Not Guarantee

Evidence Sanitizer is intentionally best-effort within its documented rules. The guarantees above protect your evidence files and your report integrity — they do not mean every secret in a file will be found or removed.

Not a complete DLP system. Unsupported body formats, encoding variations, and undocumented secret patterns may retain raw values after sanitization. See Limitations for the full list of out-of-scope formats and patterns.
Not guaranteed to remove every secret. Rule coverage is limited to the documented HTTP-style patterns. Custom headers, proprietary token formats, binary encodings, and formats outside the approved rule set may pass through unchanged.
Partial output is possible on abrupt termination. Atomic output replacement is not guaranteed. If the process is killed during the write phase, a partial output file may be left at the destination path. On a controlled write failure, the tool attempts to remove the incomplete file, but this cleanup is best-effort.
Metadata is not preserved. The output file is a new file created with normal platform defaults. Original file permissions, ownership, timestamps, ACLs, and extended attributes are not copied to the output.
No atomic output-replacement guarantee. The tool creates the output file exclusively and writes directly to it. If you need atomic replacement semantics, implement them externally (for example, write to a temporary path, verify, then rename).

Responsible Use

Evidence Sanitizer is intended for authorized security testing and internal evidence handling only. You are responsible for ensuring that you are permitted to process any evidence file you supply as input, and for manually reviewing sanitized output before sharing it externally. Do not use this tool as a substitute for manual review or your organization’s data-handling requirements.

Always inspect the sanitized output file before distributing it. The tool reports which rule families triggered and how many times, but it cannot report what it missed. Treat the sanitized output as a starting point for review, not a finished artifact.

Get Started

Using the CLI

Sanitization Rules

Concepts

Reference

Evidence Sanitizer Safety Model: Guarantees Explained

Safety Guarantees

1. Source file is never modified in place

2. Output is written to a separate, explicitly provided path

3. Existing output files are not overwritten

4. Output parent directory must already exist

5. `--dry-run` writes no output file and no temporary files

6. Reports include only fixed rule IDs and counts

7. Detected raw values are never included in reports or CLI output

8. Redaction markers are deterministic and idempotent

9. Processing is entirely local

10. Input must be strict UTF-8 or UTF-8 with BOM

11. Maximum input size is 10 MiB

12. NUL bytes are rejected

13. UTF-8 BOM, newline style, and final-newline state are preserved

What the Tool Does Not Guarantee

Responsible Use

Build docs developers (and LLMs) love

Get Started

Using the CLI

Sanitization Rules

Concepts

Reference

Documentation Index

​Safety Guarantees

​1. Source file is never modified in place

​2. Output is written to a separate, explicitly provided path

​3. Existing output files are not overwritten

​4. Output parent directory must already exist

​5. --dry-run writes no output file and no temporary files

​6. Reports include only fixed rule IDs and counts

​7. Detected raw values are never included in reports or CLI output

​8. Redaction markers are deterministic and idempotent

​9. Processing is entirely local

​10. Input must be strict UTF-8 or UTF-8 with BOM

​11. Maximum input size is 10 MiB

​12. NUL bytes are rejected

​13. UTF-8 BOM, newline style, and final-newline state are preserved

​What the Tool Does Not Guarantee

​Responsible Use

Build docs developers (and LLMs) love

Safety Guarantees

1. Source file is never modified in place

2. Output is written to a separate, explicitly provided path

3. Existing output files are not overwritten

4. Output parent directory must already exist

5. `--dry-run` writes no output file and no temporary files

6. Reports include only fixed rule IDs and counts

7. Detected raw values are never included in reports or CLI output

8. Redaction markers are deterministic and idempotent

9. Processing is entirely local

10. Input must be strict UTF-8 or UTF-8 with BOM

11. Maximum input size is 10 MiB

12. NUL bytes are rejected

13. UTF-8 BOM, newline style, and final-newline state are preserved

What the Tool Does Not Guarantee

Responsible Use