Python SDK Configuration Classes Reference

Headroom’s configuration layer is built from composable Python dataclasses. The top-level HeadroomConfig aggregates sub-configs for every subsystem — compression, cache alignment, cache optimization, CCR, and prefix freezing. All fields have sensible defaults; you only need to specify what you want to change.

from headroom import HeadroomClient, HeadroomConfig, SmartCrusherConfig, OpenAIProvider
from openai import OpenAI

config = HeadroomConfig(
    default_mode="optimize",
    smart_crusher=SmartCrusherConfig(max_items_after_crush=20),
    output_buffer_tokens=2000,
)

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    config=config,
)

HeadroomMode

HeadroomMode is a string enum that controls how the pipeline processes requests. It extends str so string literals work wherever the enum is expected.

from headroom import HeadroomMode

HeadroomMode.AUDIT     # "audit"    — observe only, never modify messages
HeadroomMode.OPTIMIZE  # "optimize" — apply the full compression pipeline
HeadroomMode.SIMULATE  # "simulate" — dry-run; returned by .simulate()

AUDIT

"audit"

Pass-through mode. Messages are analyzed for waste signals and metrics are recorded, but nothing is modified. Use this to measure savings before enabling compression.

OPTIMIZE

"optimize"

Full pipeline mode. Messages are compressed by SmartCrusher (JSON), Kompress (text), CacheAligner (prefix), and the provider cache optimizer before being sent.

SIMULATE

"simulate"

Dry-run mode. The pipeline runs completely but no API call is made. Used internally by client.chat.completions.simulate() and client.messages.simulate().

HeadroomConfig

Top-level configuration for HeadroomClient. All fields are optional with production-ready defaults.

from headroom import HeadroomConfig, HeadroomMode

config = HeadroomConfig(
    store_url="sqlite:///myapp.db",
    default_mode=HeadroomMode.OPTIMIZE,
    output_buffer_tokens=2000,
)

store_url

str

default:"\"sqlite:///headroom.db\""

Storage URL for the metrics database. Supports sqlite:///path and jsonl:///path. When passed via HeadroomClient(store_url=...), that value overrides this field.

default_mode

HeadroomMode

default:"HeadroomMode.AUDIT"

Default operating mode for all requests. Use HeadroomMode.OPTIMIZE for production workloads.

model_context_limits

dict[str, int]

default:"{}"

User-supplied overrides for model context windows (in tokens). Takes precedence over the provider’s built-in limits. Prefix matching is supported — "gpt-4" matches "gpt-4-turbo".

smart_crusher

SmartCrusherConfig

default:"SmartCrusherConfig()"

Configuration for the JSON/array compressor. See SmartCrusherConfig.

cache_aligner

CacheAlignerConfig

default:"CacheAlignerConfig()"

Configuration for the cache-prefix stability detector. See CacheAlignerConfig.

cache_optimizer

CacheOptimizerConfig

default:"CacheOptimizerConfig()"

Configuration for provider-specific cache optimization (breakpoints, prefix stabilization). See CacheOptimizerConfig.

ccr

CCRConfig

default:"CCRConfig()"

Configuration for Compress-Cache-Retrieve — reversible compression with hash-based retrieval. Enabled by default.

prefix_freeze

PrefixFreezeConfig

default:"PrefixFreezeConfig()"

Configuration for cache-aware prefix freezing. Prevents the pipeline from invalidating already-cached prefixes.

output_buffer_tokens

int

default:"4000"

Tokens reserved for the model’s output when computing how much of the input context can be compressed. Increase this for models with long outputs (e.g. code generation).

intercept_tool_results

bool

default:"false"

Enable tool-result interceptors (e.g. ast-grep Read outline). Opt-in. Also controllable via the environment variable HEADROOM_INTERCEPT_ENABLED=1.

generate_diff_artifact

bool

default:"false"

When True, each TransformResult includes a DiffArtifact with per-transform token deltas. Useful for debugging which transform caused the most savings.

pipeline_extensions

list[Any]

default:"[]"

List of PipelineExtension instances to attach to the canonical pipeline lifecycle. Extensions receive PipelineEvent objects at each stage.

discover_pipeline_extensions

bool

default:"true"

When True, Headroom discovers and loads PipelineExtension implementations registered under the headroom.pipeline_extension entry-point group.

SmartCrusherConfig

Controls the statistical JSON and array compressor. SmartCrusher is the primary tool for reducing large tool outputs — it preserves errors, anomalies, and query-relevant items while dropping redundant entries.

from headroom import SmartCrusherConfig

config = SmartCrusherConfig(
    min_tokens_to_crush=200,
    max_items_after_crush=20,
    variance_threshold=2.0,
    preserve_change_points=True,
)

enabled

bool

default:"true"

Enable or disable SmartCrusher. When False, all JSON/array tool outputs pass through unmodified.

min_items_to_analyze

int

default:"5"

Minimum array length before statistical analysis runs. Arrays shorter than this are left unchanged.

min_tokens_to_crush

int

default:"200"

Only compress a tool output if it exceeds this many tokens. Prevents unnecessary analysis on small payloads.

variance_threshold

float

default:"2.0"

Standard deviations above the mean required to flag a numeric value as an anomaly. Lower values catch more anomalies.

uniqueness_threshold

float

default:"0.1"

Fraction of unique values below which an array is considered “nearly constant”. Nearly-constant arrays use stricter deduplication.

similarity_threshold

float

default:"0.8"

String similarity threshold for clustering similar items. Items above this similarity may be grouped and represented by a single representative.

max_items_after_crush

int

default:"15"

Target maximum number of items to keep after compression. The adaptive Kneedle algorithm may keep fewer when information saturation is detected earlier.

preserve_change_points

bool

default:"true"

Keep items at significant data transitions (detected with a fixed 5-item window). Useful for time-series data where inflection points carry information.

factor_out_constants

bool

default:"false"

Disabled — would modify the original JSON schema. Kept for forward compatibility.

include_summaries

bool

default:"false"

Disabled — no AI-generated summary text is inserted. All output items come from the original array.

use_feedback_hints

bool

default:"true"

Use TOIN (Tool Output Intelligence Network) learned patterns to bias compression toward preserving historically-retrieved items.

toin_confidence_threshold

float

default:"0.3"

Minimum TOIN confidence score for a hint to influence compression.

dedup_identical_items

bool

default:"true"

Prevent multiple preservation mechanisms from keeping duplicate copies of identical items.

first_fraction

float

default:"0.3"

Fraction of max_items_after_crush reserved for items at the start of the array.

last_fraction

float

default:"0.15"

Fraction of max_items_after_crush reserved for items at the end of the array.

lossless_min_savings_ratio

float

default:"0.15"

Minimum byte-savings ratio for the lossless compaction path (CSV/JSON/markdown-kv) to be chosen over the lossy row-drop path. Must stay in lockstep with the Rust core default.

lossless_only

bool

default:"false"

When True, lossless tabular compaction still runs but any path that would produce a CCR marker is skipped. Output is always marker-free and byte-recoverable.

relevance

RelevanceScorerConfig

default:"RelevanceScorerConfig()"

Configuration for the relevance scorer that determines which items match the user’s query. See RelevanceScorerConfig.

anchor

AnchorConfig

default:"AnchorConfig()"

Configuration for dynamic anchor allocation — controls how position-based preservation slots are distributed (front-heavy for search results, back-heavy for logs, balanced for time-series). The anchor budget is a percentage of max_items_after_crush reserved for positional anchors; the rest goes to importance-scored items.

compaction_core_field_fraction

float

default:"0.8"

A field is considered “core” if it is present in at least this fraction of rows. Arrays with mostly non-core key sets are bucketed by a discriminator field rather than flattened.

compaction_heterogeneous_core_ratio

float

default:"0.6"

When the fraction of rows sharing a common core is below this value, the array is treated as heterogeneous and bucketed rather than compacted with a shared header.

compaction_max_flatten_inner_keys

int

default:"6"

Maximum number of inner keys to inline when flattening nested objects during tabular compaction.

compaction_min_buckets

int

default:"2"

Minimum number of discriminator buckets used when compacting a heterogeneous array.

compaction_max_buckets

int

default:"8"

Maximum number of discriminator buckets. Prevents over-splitting sparse arrays.

CacheAlignerConfig

Controls the cache-prefix stability detector. CacheAligner scans system messages for volatile content (UUIDs, timestamps, JWTs, hex hashes) and logs warnings when instability is detected. It does not modify messages — it only emits warnings and cache metrics for observability.

from headroom import CacheAlignerConfig

config = CacheAlignerConfig(
    enabled=True,
    use_dynamic_detector=True,
    detection_tiers=["regex"],
    entropy_threshold=0.7,
    normalize_whitespace=True,
    collapse_blank_lines=True,
)

enabled

bool

default:"false"

Enable the CacheAligner. Disabled by default because prefix stability gains are marginal in most workloads. Enable explicitly when debugging cache-miss issues.

use_dynamic_detector

bool

default:"true"

When True, uses the full DynamicContentDetector with 15+ structural patterns (UUIDs, API keys, JWTs, timestamps, hex hashes, version numbers, high-entropy strings). When False, falls back to legacy date-only regex patterns.

detection_tiers

list[Literal["regex", "ner", "semantic"]]

default:"[\"regex\"]"

Detection tiers to run (only when use_dynamic_detector=True):

"regex" — Fast structural/universal patterns (~0 ms). Recommended for production.
"ner" — Named Entity Recognition via spaCy (~5–10 ms). Optional.
"semantic" — Embedding similarity (~20–50 ms). Optional.

extra_dynamic_labels

list[str]

default:"[]"

Additional key names that hint their values are dynamic. For example, adding "session" will detect "session: abc123" and flag "abc123" as volatile.

entropy_threshold

float

default:"0.7"

Entropy threshold (0–1) for identifying random-looking strings. Higher values are more selective (only very random strings like UUIDs). Lower values are more aggressive.

normalize_whitespace

bool

default:"true"

Normalize whitespace in system messages to improve prefix stability. Caution: may break code blocks with significant indentation or ASCII art.

collapse_blank_lines

bool

default:"true"

Collapse consecutive blank lines to single blank lines.

dynamic_tail_separator

str

default:"\"\\n\\n---\\n[Dynamic Context]\\n\""

Separator marking where dynamic content begins in the system message. Content before this separator is the stable cacheable prefix; content after is dynamic.

CacheOptimizerConfig

Controls provider-specific cache optimization — Anthropic cache_control breakpoints, OpenAI prefix stabilization, and Google CachedContent API lifecycle management.

enabled

bool

default:"true"

Enable provider-specific cache optimization. Auto-detects the provider from the HeadroomClient provider instance.

auto_detect_provider

bool

default:"true"

Automatically select the cache optimizer implementation based on the provider name.

min_cacheable_tokens

int

default:"1024"

Minimum token count for a prefix to be considered cacheable. Provider may enforce a higher minimum.

enable_semantic_cache

bool

default:"false"

Enable query-level semantic caching within the optimizer layer. Requires the semantic cache extra.

semantic_cache_similarity

float

default:"0.95"

Minimum cosine similarity for a semantic cache hit.

semantic_cache_max_entries

int

default:"1000"

Maximum number of entries in the semantic cache.

semantic_cache_ttl_seconds

int

default:"300"

Time-to-live for semantic cache entries in seconds.

RelevanceScorerConfig

Controls how SmartCrusher scores items by relevance to the user’s query. Available scoring tiers are BM25 (zero dependencies), embedding-based (requires headroom-ai[relevance]), and hybrid (recommended).

from headroom import RelevanceScorerConfig

config = RelevanceScorerConfig(
    tier="hybrid",
    relevance_threshold=0.25,
    hybrid_alpha=0.5,
    adaptive_alpha=True,
)

tier

"bm25" | "embedding" | "hybrid"

default:"\"hybrid\""

Scoring method. "hybrid" combines BM25 keyword matching with semantic embeddings and is the recommended default. Falls back to BM25 if sentence-transformers is not installed.

bm25_k1

float

default:"1.5"

BM25 term-frequency saturation parameter.

bm25_b

float

default:"0.75"

BM25 length normalization parameter.

embedding_model

str

default:"ML_MODEL_DEFAULTS.sentence_transformer"

HuggingFace model name for the embedding scorer. Default is the Headroom-recommended sentence transformer.

hybrid_alpha

float

default:"0.5"

BM25 weight in the hybrid scorer. 1 - hybrid_alpha is the embedding weight. 0.5 = equal weight.

adaptive_alpha

bool

default:"true"

Dynamically adjust hybrid_alpha based on query type (keyword-heavy vs. semantic).

relevance_threshold

float

default:"0.25"

Minimum relevance score for an item to be considered relevant and kept. Lower = safer (keeps more); higher = more aggressive.

Data Model Types

Block

Atomic unit of context analysis. Each message is parsed into one or more Block objects.

kind

Literal["system", "user", "assistant", "tool_call", "tool_result", "rag", "unknown"]

The semantic type of this block.

text

str

Text content of the block.

tokens_est

int

Estimated token count for this block.

content_hash

str

Short hash of the block content for deduplication.

source_index

int

Position (index) of the originating message in the messages list.

flags

dict[str, Any]

Arbitrary flags set by analyzers (e.g. {"is_error": True}, {"is_anomaly": True}).

RequestMetrics

Comprehensive per-request metrics stored in the database after each call.

request_id

str

Unique identifier for this request.

timestamp

datetime

UTC timestamp when the request was processed.

model

str

Model name used for this request.

stream

bool

Whether the response was streamed.

mode

str

Operating mode: "audit", "optimize", or "simulate".

tokens_input_before

int

Input token count before compression.

tokens_input_after

int

Input token count after compression.

tokens_output

int | None

Output tokens from the model response. None for streaming requests (unknown at request time).

block_breakdown

dict[str, int]

Token counts by block type (system, user, assistant, tool_result, etc.).

waste_signals

dict[str, int]

Detected waste by category. See WasteSignals.to_dict() for key names.

stable_prefix_hash

str

16-character hash of the stable cache prefix. Compare across requests to detect cache misses.

cache_alignment_score

float

Score from 0.0 to 1.0 indicating how cache-friendly the prefix is.

cache_optimizer_used

str | None

Name of the cache optimizer that ran (e.g. "anthropic-cache-optimizer"), or None.

cache_optimizer_strategy

str | None

Strategy name used by the cache optimizer (e.g. "explicit_breakpoints").

cacheable_tokens

int

Number of tokens eligible for provider-side caching.

breakpoints_inserted

int

Number of cache_control breakpoints inserted (Anthropic only).

estimated_cache_hit

bool

Whether the prefix hash matched the previous request, suggesting a cache hit.

estimated_savings_percent

float

Estimated percentage savings if the provider cache was hit.

semantic_cache_hit

bool

Whether a semantic cache hit was returned instead of calling the API.

transforms_applied

list[str]

Names of all transforms that ran for this request.

messages_hash

str

Hash of the original messages for change detection.

error

str | None

Error message if the request failed, otherwise None.

TransformResult

Output of a pipeline or individual transform operation.

messages

list[dict[str, Any]]

Messages after the transform was applied.

tokens_before

int

Token count before this transform.

tokens_after

int

Token count after this transform.

transforms_applied

list[str]

Names of every sub-transform that ran.

markers_inserted

list[str]

CCR retrieval markers that were injected into messages.

warnings

list[str]

Non-fatal warnings emitted during the transform (e.g. detected volatile content in the prefix).

diff_artifact

DiffArtifact | None

Per-transform diff details. Populated only when HeadroomConfig.generate_diff_artifact=True.

cache_metrics

CachePrefixMetrics | None

Cache prefix stability metrics from CacheAligner.

timing

dict[str, float]

Wall-clock time in milliseconds per transform name.

transforms_summary

dict[str, int]

Property — counted summary of transforms_applied. Example: {"router:tool_result:json": 4}.

SimulationResult

Returned by client.chat.completions.simulate() and client.messages.simulate(). Contains projected compression metrics without any API call.

tokens_before

int

Token count before compression.

tokens_after

int

Projected token count after compression.

tokens_saved

int

tokens_before - tokens_after.

transforms

list[str]

Transforms that would be applied.

estimated_savings

str

Human-readable cost estimate per request, e.g. "$0.0042 per request".

messages_optimized

list[dict[str, Any]]

The projected compressed messages.

block_breakdown

dict[str, int]

Token counts by block type.

waste_signals

dict[str, int]

Waste by category.

stable_prefix_hash

str

16-character prefix hash after optimization.

cache_alignment_score

float

Cache-friendliness score (0–1).

Python SDK

TypeScript SDK

CLI Reference

Proxy HTTP API

Python SDK Configuration Classes Reference

HeadroomMode

HeadroomConfig

SmartCrusherConfig

CacheAlignerConfig

CacheOptimizerConfig

RelevanceScorerConfig

Data Model Types

Block

RequestMetrics

TransformResult

SimulationResult

Build docs developers (and LLMs) love

Python SDK

TypeScript SDK

CLI Reference

Proxy HTTP API

Documentation Index

​HeadroomMode

​HeadroomConfig

​SmartCrusherConfig

​CacheAlignerConfig

​CacheOptimizerConfig

​RelevanceScorerConfig

​Data Model Types

​Block

​RequestMetrics

​TransformResult

​SimulationResult

Build docs developers (and LLMs) love

HeadroomMode

HeadroomConfig

SmartCrusherConfig

CacheAlignerConfig

CacheOptimizerConfig

RelevanceScorerConfig

Data Model Types

Block

RequestMetrics

TransformResult

SimulationResult