Golden Fixture Examples: Real Sanitization Scenarios

The golden fixture suite lives in tests/fixtures/golden/ and is the primary end-to-end test harness for Evidence Sanitizer. Each fixture is a pair of plain-text files — a .input.txt file containing raw, unsanitized evidence, and a matching .expected.txt file containing the exact sanitized output that sanitize_text() must produce. Every fixture uses only synthetic values and reserved domains (example.test, api.example.test, mobile.example.test, callback.example.test) so nothing real is embedded in the repository. Together they document realistic before-and-after behavior and serve as regression guards: any code change that shifts the output of a fixture causes the test to fail immediately.

All secret values in these fixtures are synthetic placeholders (e.g. synthetic-bearer-token, synthetic-session-cookie). They exist solely to make the expected transformation readable and unambiguous — they are not real credentials.

Running the Fixture Tests

uv run pytest tests/test_golden_fixtures.py

Each fixture test asserts three things: the sanitized text matches the .expected.txt file exactly, the SanitizationReport rule counts match hardcoded expected counts, and re-running sanitize_text() on the output is fully idempotent (no further changes, no rules triggered).

Fixture 1 — `http_request_mixed`

A realistic raw HTTP GET request carrying credentials in multiple places simultaneously: a Bearer token in Authorization, a sensitive API key in both a query parameter and an X-API-Key header, additional secret query parameters, and a multi-value Cookie header. Rules triggered: authorization.bearer ×1, query.secret ×3, header.secret ×1, cookie.value ×3

Input
Output

GET /api/profile?access_token=synthetic-access-token&sig=synthetic-signature&api_key=synthetic-api-key&theme=dark HTTP/1.1
Host: example.test
Authorization: Bearer synthetic-bearer-token
X-API-Key: synthetic-api-key
Cookie: session=synthetic-session-cookie; _ga=synthetic-telemetry-id; theme=light; unknown=synthetic-unknown-value
Accept: application/json

GET /api/profile?access_token=<REDACTED:query.secret>&sig=<REDACTED:query.secret>&api_key=<REDACTED:query.secret>&theme=dark HTTP/1.1
Host: example.test
Authorization: Bearer <REDACTED:authorization.bearer>
X-API-Key: <REDACTED:header.secret>
Cookie: session=<REDACTED:cookie.value>; _ga=<REDACTED:cookie.value>; theme=light; unknown=<REDACTED:cookie.value>
Accept: application/json

Annotation: The authorization.bearer rule replaces the credential after Bearer. The query.secret rule fires on access_token, sig, and api_key — all three are on the approved sensitive name list — while theme is not sensitive and is left unchanged. The header.secret rule fires on X-API-Key. The cookie.value rule parses the Cookie header semicolon-by-semicolon: session, _ga, and unknown are all classified as sensitive, while theme is an approved non-sensitive name and stays verbatim.

Fixture 2 — `burp_repeater_like`

A proxy/repeater-style POST request as it might appear when copied out of Burp Suite. Demonstrates Authorization: Basic, a single-value Cookie that contains a raw malformed entry (no = sign), and a folded Cookie header that is intentionally left unchanged by design. Rules triggered: authorization.basic ×1, query.secret ×1, header.secret ×1, cookie.header ×1

Input
Output

POST /admin?signature=synthetic-signature HTTP/1.1
Host: callback.example.test
Authorization: Basic synthetic-basic-token
X-CSRF-Token: synthetic-csrf-token
Cookie: session=synthetic-session-cookie; malformed-raw
# The folded Cookie form below is intentionally left unchanged by design.
Cookie: folded-test=synthetic-folded-cookie-value
	folded-continuation=value
Content-Type: application/x-www-form-urlencoded
Set-Cookie: session=synthetic-set-cookie-value; Path=/
Accept: */*

POST /admin?signature=<REDACTED:query.secret> HTTP/1.1
Host: callback.example.test
Authorization: Basic <REDACTED:authorization.basic>
X-CSRF-Token: <REDACTED:header.secret>
Cookie: <REDACTED:cookie.header>
# The folded Cookie form below is intentionally left unchanged by design.
Cookie: folded-test=synthetic-folded-cookie-value
	folded-continuation=value
Content-Type: application/x-www-form-urlencoded
Set-Cookie: session=synthetic-set-cookie-value; Path=/
Accept: */*

Annotation: Authorization: Basic triggers the authorization.basic rule. signature in the query string triggers query.secret. X-CSRF-Token is on the sensitive header name list and triggers header.secret. The first Cookie header contains malformed-raw (no = sign), which causes the safe cookie parser to fall back to the whole-header cookie.header rule, replacing the entire value rather than individual cookie values. The second Cookie line has a tab-indented continuation (a folded header) — this is an explicitly out-of-scope pattern documented in Limitations, so it is left untouched. Set-Cookie is also out of scope and is left unchanged.

Fixture 3 — `api_log_mixed`

A line-oriented API gateway log excerpt. Demonstrates that authorization headers and sensitive query parameters are matched anywhere in a line, not just in HTTP-style request blocks, and that a non-Bearer/Basic Authorization scheme (Signature) triggers authorization.other. Rules triggered: authorization.other ×1, query.secret ×2

Input
Output

2024-01-15T09:23:17Z 200 req_abc123 GET https://api.example.test/v1/data?x-amz-signature=synthetic-amz-signature&access_token_expires=999&timestamp=1705310597
2024-01-15T09:23:18Z 401 req_def456 POST https://api.example.test/v1/auth
Authorization: Signature keyId="synthetic-key",algorithm="hmac-sha256",signature="synthetic-signature"
2024-01-15T09:23:19Z 200 req_ghi789 GET https://api.example.test/v1/search?api_key=synthetic-api-key&signature_algorithm=sha256

2024-01-15T09:23:17Z 200 req_abc123 GET https://api.example.test/v1/data?x-amz-signature=<REDACTED:query.secret>&access_token_expires=999&timestamp=1705310597
2024-01-15T09:23:18Z 401 req_def456 POST https://api.example.test/v1/auth
Authorization: Signature <REDACTED:authorization.credentials>
2024-01-15T09:23:19Z 200 req_ghi789 GET https://api.example.test/v1/search?api_key=<REDACTED:query.secret>&signature_algorithm=sha256

Annotation: x-amz-signature is on the sensitive query parameter name list and triggers query.secret. access_token_expires and timestamp are not sensitive names and remain. api_key triggers query.secret. signature_algorithm is not on the approved list (it is a configuration value, not a secret) and remains. The Signature scheme in Authorization is syntactically valid but not Bearer or Basic, so it triggers authorization.other and the marker becomes <REDACTED:authorization.credentials>.

Fixture 4 — `json_api_body_mixed`

A POST request with a JSON body containing multiple sensitive field names alongside non-sensitive ones. Also demonstrates that a broader Authorization: Bearer finding takes precedence — the token in the header is covered by authorization.bearer, while the JSON body values are covered by json.value. Rules triggered: authorization.bearer ×1, json.value ×6

Input
Output

POST /api/session HTTP/1.1
Host: example.com
Content-Type: application/json
Authorization: Bearer synthetic-bearer-token

{"access_token":"synthetic-access-token","refresh_token":"synthetic-refresh-token","id_token":"synthetic-id-token","token_type":"Bearer","client_secret":"synthetic-client-secret","password":"synthetic-password","api_key":"synthetic-api-key","user_id":"user-123","theme":"dark"}

POST /api/session HTTP/1.1
Host: example.com
Content-Type: application/json
Authorization: Bearer <REDACTED:authorization.bearer>

{"access_token":"<REDACTED:json.value>","refresh_token":"<REDACTED:json.value>","id_token":"<REDACTED:json.value>","token_type":"Bearer","client_secret":"<REDACTED:json.value>","password":"<REDACTED:json.value>","api_key":"<REDACTED:json.value>","user_id":"user-123","theme":"dark"}

Annotation: The JSON scanner uses conservative raw string-key/string-value matching rather than a full JSON parser. Fields access_token, refresh_token, id_token, client_secret, password, and api_key are all on the approved JSON sensitive field name list, each triggering json.value. token_type, user_id, and theme are not on the list and remain as-is. Only direct string-to-string pairs are matched; numbers, booleans, null, arrays, and objects are left unchanged even if their keys match a sensitive name.

Fixture 5 — `form_urlencoded_body_mixed`

Two POST requests with application/x-www-form-urlencoded bodies. Demonstrates approved vs. deferred field handling, an embedded URL in a redirect_uri value that contains a sensitive query parameter, and a second request where the form value itself looks like a URL with secret query parameters. Rules triggered: authorization.bearer ×1, form.value ×10, query.secret ×1

Input
Output

POST /oauth/token HTTP/1.1
Host: api.example.test
Authorization: Bearer synthetic-bearer-token
Content-Type: application/x-www-form-urlencoded

access_token=synthetic-access-token&refresh_token=synthetic-refresh-token&client_secret=synthetic-client-secret&password=synthetic-password&csrf=synthetic-csrf-token&session=&jwt=synthetic-jwt-plus+value&api_key=synthetic-api-key&grant_type=authorization_code&username=synthetic-username&scope=openid&code=synthetic-code&state=synthetic-state&nonce=synthetic-nonce&access%5Ftoken=synthetic-percent-name-token&redirect_uri=https://callback.example.test/cb?access_token=synthetic-nested-access-token

POST /api/login HTTP/1.1
Host: example.test
Content-Type: application/x-www-form-urlencoded

token=https://api.example.test/cb?token=synthetic-overlap-token&sig=synthetic-overlap-sig

POST /oauth/token HTTP/1.1
Host: api.example.test
Authorization: Bearer <REDACTED:authorization.bearer>
Content-Type: application/x-www-form-urlencoded

access_token=<REDACTED:form.value>&refresh_token=<REDACTED:form.value>&client_secret=<REDACTED:form.value>&password=<REDACTED:form.value>&csrf=<REDACTED:form.value>&session=<REDACTED:form.value>&jwt=<REDACTED:form.value>&api_key=<REDACTED:form.value>&grant_type=authorization_code&username=synthetic-username&scope=openid&code=synthetic-code&state=synthetic-state&nonce=synthetic-nonce&access%5Ftoken=synthetic-percent-name-token&redirect_uri=https://callback.example.test/cb?access_token=<REDACTED:query.secret>

POST /api/login HTTP/1.1
Host: example.test
Content-Type: application/x-www-form-urlencoded

token=<REDACTED:form.value>&sig=<REDACTED:form.value>

Annotation: Form scanning is gated by a Content-Type: application/x-www-form-urlencoded header followed by a blank separator line. Only the immediate first physical line after the separator is scanned — multi-line or wrapped form bodies are not supported. The deferred fields grant_type, username, scope, code, state, and nonce are intentionally not redacted. access%5Ftoken has a percent-encoded name; no percent-decoding is performed, so this field name does not match the approved list and its value stays. The access_token inside redirect_uri is a nested URL query parameter — the form scanner leaves it for the query scanner, which correctly fires query.secret on it. In the second request, token and sig are both on the form sensitive name list, each triggering form.value.

Additional Fixtures

The following fixtures also exist in tests/fixtures/golden/ and are exercised by the same parameterized test. Refer to the fixture files directly for their full content.

mobile_api_trace_like — Mobile/debug trace

A mobile API trace header block for mobile.example.test, demonstrating Authorization: Bearer, X-Auth-Token (sensitive header), a Cookie with both sensitive and non-sensitive values, and a single sensitive query parameter.Rules triggered: authorization.bearer ×1, query.secret ×1, header.secret ×1, cookie.value ×3Notable detail: device=pixel7 is a cookie name classified as sensitive (device identifiers are treated as sensitive by the cookie classifier), so it is redacted to device=<REDACTED:cookie.value>. X-Request-ID is not on the sensitive header list and is left unchanged.

report_note_mixed — Human-written assessment notes

A prose note block as a pentest report author might write it, mixing free-form sentences with embedded HTTP snippets. Demonstrates that rules fire anywhere in the file regardless of surrounding context: a Bearer token in an Authorization header inside a prose block, X-API-Key in that same snippet, and token, sig, and signature query parameters in a URL on a prose line.Rules triggered: authorization.bearer ×1, query.secret ×2, header.secret ×1Notable detail: code and state query parameters in the URL are not on the sensitive name list and remain unchanged. Pre-existing <REDACTED:cookie.value> markers in the input (already redacted before this run) are correctly preserved and do not trigger any additional rules, demonstrating idempotence of existing markers.

edge_cases_markers_and_malformed_cookie — Idempotence and fallback

proxy_authorization_mixed — Proxy-Authorization rule family

A request demonstrating the full proxy_authorization rule family: Bearer, Basic, and multiple generic scheme lines. Also shows that Bearer credentials already redacted by a wrong-family marker, nested query/JSON/form tokens inside Proxy-Authorization values, and a normal token query parameter in a Referer header are all handled. Out-of-scope proxy-related headers (Proxy-Authenticate, WWW-Authenticate, X-Proxy-Authorization) are left unchanged.Rules triggered: proxy_authorization.bearer ×1, proxy_authorization.basic ×2, proxy_authorization.other ×4, query.secret ×1Notable detail: The proxy_authorization.other rule fires four times — on a Digest scheme line, on a Custom line whose value is a wrong-family marker, on a Custom line whose value is a URL with a nested query token, and on a Custom line whose value is a JSON-like fragment. The proxy_authorization.basic rule fires twice because there are two Basic lines. Nested query/JSON/form tokens inside Proxy-Authorization lines are suppressed from the query/JSON/form scanners by the overlap protection mechanism, so they do not produce duplicate findings.

Get Started

Using the CLI

Sanitization Rules

Concepts

Reference

Golden Fixture Examples: Real Sanitization Scenarios

Running the Fixture Tests

Fixture 1 — `http_request_mixed`

Fixture 2 — `burp_repeater_like`

Fixture 3 — `api_log_mixed`

Fixture 4 — `json_api_body_mixed`

Fixture 5 — `form_urlencoded_body_mixed`

Additional Fixtures

Build docs developers (and LLMs) love

Get Started

Using the CLI

Sanitization Rules

Concepts

Reference

Documentation Index

​Running the Fixture Tests

​Fixture 1 — http_request_mixed

​Fixture 2 — burp_repeater_like

​Fixture 3 — api_log_mixed

​Fixture 4 — json_api_body_mixed

​Fixture 5 — form_urlencoded_body_mixed

​Additional Fixtures

Build docs developers (and LLMs) love

Running the Fixture Tests

Fixture 1 — `http_request_mixed`

Fixture 2 — `burp_repeater_like`

Fixture 3 — `api_log_mixed`

Fixture 4 — `json_api_body_mixed`

Fixture 5 — `form_urlencoded_body_mixed`

Additional Fixtures