Headroom’s configuration layer is built from composable Python dataclasses. The top-levelDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
HeadroomConfig aggregates sub-configs for every subsystem — compression, cache alignment, cache optimization, CCR, and prefix freezing. All fields have sensible defaults; you only need to specify what you want to change.
HeadroomMode
HeadroomMode is a string enum that controls how the pipeline processes requests. It extends str so string literals work wherever the enum is expected.
Pass-through mode. Messages are analyzed for waste signals and metrics are recorded, but nothing is modified. Use this to measure savings before enabling compression.
Full pipeline mode. Messages are compressed by SmartCrusher (JSON), Kompress (text), CacheAligner (prefix), and the provider cache optimizer before being sent.
Dry-run mode. The pipeline runs completely but no API call is made. Used internally by
client.chat.completions.simulate() and client.messages.simulate().HeadroomConfig
Top-level configuration forHeadroomClient. All fields are optional with production-ready defaults.
Storage URL for the metrics database. Supports
sqlite:///path and jsonl:///path. When passed via HeadroomClient(store_url=...), that value overrides this field.Default operating mode for all requests. Use
HeadroomMode.OPTIMIZE for production workloads.User-supplied overrides for model context windows (in tokens). Takes precedence over the provider’s built-in limits. Prefix matching is supported —
"gpt-4" matches "gpt-4-turbo".Configuration for the JSON/array compressor. See SmartCrusherConfig.
Configuration for the cache-prefix stability detector. See CacheAlignerConfig.
Configuration for provider-specific cache optimization (breakpoints, prefix stabilization). See CacheOptimizerConfig.
Configuration for Compress-Cache-Retrieve — reversible compression with hash-based retrieval. Enabled by default.
Configuration for cache-aware prefix freezing. Prevents the pipeline from invalidating already-cached prefixes.
Tokens reserved for the model’s output when computing how much of the input context can be compressed. Increase this for models with long outputs (e.g. code generation).
Enable tool-result interceptors (e.g.
ast-grep Read outline). Opt-in. Also controllable via the environment variable HEADROOM_INTERCEPT_ENABLED=1.When
True, each TransformResult includes a DiffArtifact with per-transform token deltas. Useful for debugging which transform caused the most savings.List of
PipelineExtension instances to attach to the canonical pipeline lifecycle. Extensions receive PipelineEvent objects at each stage.When
True, Headroom discovers and loads PipelineExtension implementations registered under the headroom.pipeline_extension entry-point group.SmartCrusherConfig
Controls the statistical JSON and array compressor. SmartCrusher is the primary tool for reducing large tool outputs — it preserves errors, anomalies, and query-relevant items while dropping redundant entries.Enable or disable SmartCrusher. When
False, all JSON/array tool outputs pass through unmodified.Minimum array length before statistical analysis runs. Arrays shorter than this are left unchanged.
Only compress a tool output if it exceeds this many tokens. Prevents unnecessary analysis on small payloads.
Standard deviations above the mean required to flag a numeric value as an anomaly. Lower values catch more anomalies.
Fraction of unique values below which an array is considered “nearly constant”. Nearly-constant arrays use stricter deduplication.
String similarity threshold for clustering similar items. Items above this similarity may be grouped and represented by a single representative.
Target maximum number of items to keep after compression. The adaptive Kneedle algorithm may keep fewer when information saturation is detected earlier.
Keep items at significant data transitions (detected with a fixed 5-item window). Useful for time-series data where inflection points carry information.
Disabled — would modify the original JSON schema. Kept for forward compatibility.
Disabled — no AI-generated summary text is inserted. All output items come from the original array.
Use TOIN (Tool Output Intelligence Network) learned patterns to bias compression toward preserving historically-retrieved items.
Minimum TOIN confidence score for a hint to influence compression.
Prevent multiple preservation mechanisms from keeping duplicate copies of identical items.
Fraction of
max_items_after_crush reserved for items at the start of the array.Fraction of
max_items_after_crush reserved for items at the end of the array.Minimum byte-savings ratio for the lossless compaction path (CSV/JSON/markdown-kv) to be chosen over the lossy row-drop path. Must stay in lockstep with the Rust core default.
When
True, lossless tabular compaction still runs but any path that would produce a CCR marker is skipped. Output is always marker-free and byte-recoverable.Configuration for the relevance scorer that determines which items match the user’s query. See RelevanceScorerConfig.
Configuration for dynamic anchor allocation — controls how position-based preservation slots are distributed (front-heavy for search results, back-heavy for logs, balanced for time-series). The anchor budget is a percentage of
max_items_after_crush reserved for positional anchors; the rest goes to importance-scored items.A field is considered “core” if it is present in at least this fraction of rows. Arrays with mostly non-core key sets are bucketed by a discriminator field rather than flattened.
When the fraction of rows sharing a common core is below this value, the array is treated as heterogeneous and bucketed rather than compacted with a shared header.
Maximum number of inner keys to inline when flattening nested objects during tabular compaction.
Minimum number of discriminator buckets used when compacting a heterogeneous array.
Maximum number of discriminator buckets. Prevents over-splitting sparse arrays.
CacheAlignerConfig
Controls the cache-prefix stability detector. CacheAligner scans system messages for volatile content (UUIDs, timestamps, JWTs, hex hashes) and logs warnings when instability is detected. It does not modify messages — it only emits warnings and cache metrics for observability.Enable the CacheAligner. Disabled by default because prefix stability gains are marginal in most workloads. Enable explicitly when debugging cache-miss issues.
When
True, uses the full DynamicContentDetector with 15+ structural patterns (UUIDs, API keys, JWTs, timestamps, hex hashes, version numbers, high-entropy strings). When False, falls back to legacy date-only regex patterns.Detection tiers to run (only when
use_dynamic_detector=True):"regex"— Fast structural/universal patterns (~0 ms). Recommended for production."ner"— Named Entity Recognition via spaCy (~5–10 ms). Optional."semantic"— Embedding similarity (~20–50 ms). Optional.
Additional key names that hint their values are dynamic. For example, adding
"session" will detect "session: abc123" and flag "abc123" as volatile.Entropy threshold (0–1) for identifying random-looking strings. Higher values are more selective (only very random strings like UUIDs). Lower values are more aggressive.
Normalize whitespace in system messages to improve prefix stability. Caution: may break code blocks with significant indentation or ASCII art.
Collapse consecutive blank lines to single blank lines.
Separator marking where dynamic content begins in the system message. Content before this separator is the stable cacheable prefix; content after is dynamic.
CacheOptimizerConfig
Controls provider-specific cache optimization — Anthropiccache_control breakpoints, OpenAI prefix stabilization, and Google CachedContent API lifecycle management.
Enable provider-specific cache optimization. Auto-detects the provider from the
HeadroomClient provider instance.Automatically select the cache optimizer implementation based on the provider name.
Minimum token count for a prefix to be considered cacheable. Provider may enforce a higher minimum.
Enable query-level semantic caching within the optimizer layer. Requires the semantic cache extra.
Minimum cosine similarity for a semantic cache hit.
Maximum number of entries in the semantic cache.
Time-to-live for semantic cache entries in seconds.
RelevanceScorerConfig
Controls how SmartCrusher scores items by relevance to the user’s query. Available scoring tiers are BM25 (zero dependencies), embedding-based (requiresheadroom-ai[relevance]), and hybrid (recommended).
Scoring method.
"hybrid" combines BM25 keyword matching with semantic embeddings and is the recommended default. Falls back to BM25 if sentence-transformers is not installed.BM25 term-frequency saturation parameter.
BM25 length normalization parameter.
HuggingFace model name for the embedding scorer. Default is the Headroom-recommended sentence transformer.
BM25 weight in the hybrid scorer.
1 - hybrid_alpha is the embedding weight. 0.5 = equal weight.Dynamically adjust
hybrid_alpha based on query type (keyword-heavy vs. semantic).Minimum relevance score for an item to be considered relevant and kept. Lower = safer (keeps more); higher = more aggressive.
Data Model Types
Block
Atomic unit of context analysis. Each message is parsed into one or moreBlock objects.
The semantic type of this block.
Text content of the block.
Estimated token count for this block.
Short hash of the block content for deduplication.
Position (index) of the originating message in the messages list.
Arbitrary flags set by analyzers (e.g.
{"is_error": True}, {"is_anomaly": True}).RequestMetrics
Comprehensive per-request metrics stored in the database after each call.Unique identifier for this request.
UTC timestamp when the request was processed.
Model name used for this request.
Whether the response was streamed.
Operating mode:
"audit", "optimize", or "simulate".Input token count before compression.
Input token count after compression.
Output tokens from the model response.
None for streaming requests (unknown at request time).Token counts by block type (system, user, assistant, tool_result, etc.).
Detected waste by category. See
WasteSignals.to_dict() for key names.16-character hash of the stable cache prefix. Compare across requests to detect cache misses.
Score from 0.0 to 1.0 indicating how cache-friendly the prefix is.
Name of the cache optimizer that ran (e.g.
"anthropic-cache-optimizer"), or None.Strategy name used by the cache optimizer (e.g.
"explicit_breakpoints").Number of tokens eligible for provider-side caching.
Number of
cache_control breakpoints inserted (Anthropic only).Whether the prefix hash matched the previous request, suggesting a cache hit.
Estimated percentage savings if the provider cache was hit.
Whether a semantic cache hit was returned instead of calling the API.
Names of all transforms that ran for this request.
Hash of the original messages for change detection.
Error message if the request failed, otherwise
None.TransformResult
Output of a pipeline or individual transform operation.Messages after the transform was applied.
Token count before this transform.
Token count after this transform.
Names of every sub-transform that ran.
CCR retrieval markers that were injected into messages.
Non-fatal warnings emitted during the transform (e.g. detected volatile content in the prefix).
Per-transform diff details. Populated only when
HeadroomConfig.generate_diff_artifact=True.Cache prefix stability metrics from CacheAligner.
Wall-clock time in milliseconds per transform name.
Property — counted summary of
transforms_applied. Example: {"router:tool_result:json": 4}.SimulationResult
Returned byclient.chat.completions.simulate() and client.messages.simulate(). Contains projected compression metrics without any API call.
Token count before compression.
Projected token count after compression.
tokens_before - tokens_after.Transforms that would be applied.
Human-readable cost estimate per request, e.g.
"$0.0042 per request".The projected compressed messages.
Token counts by block type.
Waste by category.
16-character prefix hash after optimization.
Cache-friendliness score (0–1).