Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

Headroom can be configured via the SDK constructor, the headroom proxy command line, environment variables, or per-request overrides. Settings are applied in this order — later entries override earlier ones: built-in defaults → environment variables → SDK constructor arguments → per-request overrides.

SDK Modes

These modes apply to SDK usage via HeadroomClient(default_mode=...) or as a per-request override. They are not the same axis as the proxy --mode flag — each controls a different layer of the stack.
ModeBehaviorUse case
auditObserves and logs; no modificationsProduction monitoring, baseline measurement
optimizeApplies safe, deterministic transformsProduction optimization
simulateReturns the transform plan without making an API callTesting, cost estimation
Proxy --mode is a separate axis. headroom proxy --mode token maximizes compression by rewriting prior turns. --mode cache freezes prior turns to maximize provider prefix-cache hit rates. The proxy does not accept audit, optimize, or simulate.

SDK Configuration

from headroom import HeadroomClient, OpenAIProvider
from openai import OpenAI

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),

    # Mode: "audit" (observe only) or "optimize" (apply transforms)
    default_mode="optimize",

    # Enable provider-specific cache optimization
    enable_cache_optimizer=True,

    # Enable query-level semantic caching
    enable_semantic_cache=False,

    # Override default context limits per model
    model_context_limits={
        "gpt-4o": 128000,
        "gpt-4o-mini": 128000,
    },

    # Database location (defaults to temp directory)
    # store_url="sqlite:////absolute/path/to/headroom.db",
)

Python HeadroomClient parameters

ParameterTypeDefaultDescription
original_clientSDK clientrequiredThe underlying provider SDK client (OpenAI(), Anthropic(), etc.)
providerProviderrequiredProvider adapter (OpenAIProvider(), AnthropicProvider(), etc.)
default_modestr"audit"Default operating mode: audit, optimize, or simulate
enable_cache_optimizerboolTrueEnable provider-specific cache optimization
enable_semantic_cacheboolFalseEnable query-level semantic caching
model_context_limitsdict{}Per-model context window overrides
store_urlstrtemp SQLiteDatabase URL for the compression store

TypeScript HeadroomClient options

OptionTypeDefaultDescription
baseUrlstringhttp://localhost:8787Headroom proxy base URL (also reads HEADROOM_BASE_URL)
apiKeystringOptional API key for authenticated endpoints (also reads HEADROOM_API_KEY)
timeoutnumber30000Request timeout in milliseconds
fallbackbooleantruePass through unmodified if compression fails
retriesnumber2Number of retry attempts

HeadroomConfig Dataclass Fields

HeadroomConfig is the main configuration object passed to HeadroomClient in Python. All fields have defaults and can be overridden selectively.
from headroom.config import HeadroomConfig, HeadroomMode

config = HeadroomConfig(
    store_url="sqlite:///headroom.db",
    default_mode=HeadroomMode.OPTIMIZE,
    output_buffer_tokens=4000,
    intercept_tool_results=False,
    generate_diff_artifact=False,
    discover_pipeline_extensions=True,
)
FieldTypeDefaultDescription
store_urlstr"sqlite:///headroom.db"SQLAlchemy URL for the compression store
default_modeHeadroomModeAUDITDefault operating mode
model_context_limitsdict{}User overrides for model context limits
output_buffer_tokensint4000Output buffer reserved for the model’s response
intercept_tool_resultsboolFalseOpt in to tool-result interceptors (ast-grep Read outliner, etc.)
generate_diff_artifactboolFalseOpt-in per-transform diff artifact generation for debugging
discover_pipeline_extensionsboolTrueAuto-discover registered pipeline extensions
smart_crusherSmartCrusherConfigsee belowJSON compression settings
cache_alignerCacheAlignerConfigsee belowPrefix stabilization settings
cache_optimizerCacheOptimizerConfigenabledProvider-specific cache optimization
ccrCCRConfigenabledCompress-Cache-Retrieve settings
prefix_freezePrefixFreezeConfigenabledCache-aware prefix freezing

SmartCrusherConfig Fields

SmartCrusher is Headroom’s universal JSON array compressor. It uses statistical analysis to intelligently select which items to keep while preserving the original JSON schema.
from headroom.config import SmartCrusherConfig

config = SmartCrusherConfig(
    enabled=True,
    min_items_to_analyze=5,
    min_tokens_to_crush=200,
    max_items_after_crush=15,
    variance_threshold=2.0,
    uniqueness_threshold=0.1,
    similarity_threshold=0.8,
    first_fraction=0.3,
    last_fraction=0.15,
    preserve_change_points=True,
    factor_out_constants=False,
    include_summaries=False,
    lossless_only=False,
)
FieldDefaultDescription
enabledTrueEnable SmartCrusher (sole tool-output compressor by default)
min_items_to_analyze5Don’t analyze arrays smaller than this
min_tokens_to_crush200Only compress content above this token threshold
max_items_after_crush15Target maximum items in output
variance_threshold2.0Standard deviations for change-point detection
uniqueness_threshold0.1Below this = nearly constant (skip statistical analysis)
similarity_threshold0.8Similarity threshold for clustering similar strings
first_fraction0.3Fraction of K slots allocated to start of array
last_fraction0.15Fraction of K slots allocated to end of array
preserve_change_pointsTrueAlways keep items at statistical change points
factor_out_constantsFalseDisabled — preserves original JSON schema
include_summariesFalseDisabled — no generated text injected
lossless_onlyFalseWhen True, never emit CCR markers; only lossless compaction
lossless_min_savings_ratio0.15Minimum byte savings for lossless path to win over lossy
use_feedback_hintsTrueUse TOIN learned patterns to adjust compression
toin_confidence_threshold0.3Minimum TOIN confidence required to apply recommendations

CacheAlignerConfig Fields

CacheAligner stabilizes system-prompt prefixes so provider KV caches actually hit on repeated turns. It extracts dynamic content (dates, UUIDs, tokens) and moves it to a trailing section after a stable prefix.
from headroom.config import CacheAlignerConfig

config = CacheAlignerConfig(
    enabled=True,
    use_dynamic_detector=True,
    detection_tiers=["regex"],
    entropy_threshold=0.7,
    normalize_whitespace=True,
    collapse_blank_lines=True,
    dynamic_tail_separator="\n\n---\n[Dynamic Context]\n",
)
FieldDefaultDescription
enabledFalseDisabled by default — prefix-stability gains are marginal in most setups
use_dynamic_detectorTrueUse full DynamicContentDetector (15+ patterns) instead of legacy date regex
detection_tiers["regex"]Detection tiers: regex (fast, ~0ms), ner (spaCy, ~5–10ms), semantic (~20–50ms)
extra_dynamic_labels[]Extra KEY names that hint the VALUE is dynamic (e.g. "session")
entropy_threshold0.7Entropy threshold for random-string detection (0–1; higher = more selective)
normalize_whitespaceTrueNormalize whitespace in system prompts
collapse_blank_linesTrueCollapse multiple blank lines
dynamic_tail_separator"\n\n---\n[Dynamic Context]\n"Separator marking where dynamic content begins
date_patterns(4 patterns)Legacy date regex patterns used when use_dynamic_detector=False
CacheAligner is applied only to system messages — never to user, assistant, or tool content. Code blocks with significant indentation or ASCII art may be affected by whitespace normalization; test before enabling in production.

Proxy --mode Flag

The proxy’s --mode flag is a separate optimization axis from SDK modes:

token mode (default)

Prioritize compression. Prior turns may be rewritten for maximum token savings.
headroom proxy --mode token

cache mode

Freeze prior turns to maximize provider prefix-cache hit rate.
headroom proxy --mode cache
Legacy aliases token_mode, token_savings, token_headroom, cache_mode, and cost_savings are still accepted. Set via HEADROOM_MODE.

Per-Request Overrides

Override configuration on individual requests without changing global settings:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],

    # Override mode for this request only
    headroom_mode="audit",

    # Reserve more tokens for output
    headroom_output_buffer_tokens=8000,

    # Keep last N turns uncompressed
    headroom_keep_turns=5,

    # Skip compression for specific tools
    headroom_tool_profiles={
        "important_tool": {"skip_compression": True},
        "search_tool": {"max_items_after_crush": 25},
    },
)

Environment Variables

Core proxy settings

VariableDefaultDescription
HEADROOM_HOST127.0.0.1Proxy bind host
HEADROOM_PORT8787Proxy bind port
HEADROOM_MODEtokenProxy optimization mode: token or cache
HEADROOM_WORKERS1Uvicorn worker count
HEADROOM_BUDGETDaily budget limit in USD
HEADROOM_STATELESSfalseSet to true to disable all filesystem writes
HEADROOM_REQUEST_TIMEOUT300Request timeout in seconds

Output token reduction

VariableDefaultDescription
HEADROOM_OUTPUT_SHAPERoffSet to 1 to enable verbosity steering and effort routing (output token reduction)
HEADROOM_OUTPUT_HOLDOUTFraction of conversations left unshaped as a measured control group (e.g. 0.1 for 10%). Enables measured rather than estimated output savings.

Tool and context routing

VariableDefaultDescription
HEADROOM_CONTEXT_TOOLrtkCLI context tool for headroom wrap: rtk or lean-ctx
HEADROOM_RTK_GAIN_SCOPEglobalRTK savings scope: global or project
HEADROOM_PROTECT_TOOL_RESULTSComma-separated tool names whose results are never lossy-compressed

TLS and security

VariableDefaultDescription
HEADROOM_TLS_STRICT1Set to 0 to relax OpenSSL’s RFC 5280 strict CA-constraint check (required behind Zscaler/Netskope on Python 3.13+). Chain validation, hostname, and expiry checks remain enabled.
HEADROOM_PROXY_TOKENRequire this bearer token for non-loopback callers

Telemetry and updates

VariableDefaultDescription
HEADROOM_TELEMETRYoffSet to on to opt in to anonymous aggregate telemetry
HEADROOM_UPDATE_CHECKonSet to off to disable the daily PyPI update check (also skipped in --stateless and CI)

Embedder runtime

VariableDefaultDescription
HEADROOM_EMBEDDER_RUNTIMEautoSet to pytorch_mps to run the memory embedder on Apple GPU (MPS). Requires [pytorch-mps] extra.
HEADROOM_BINARIES_OFFLINEoffSet to 1 to disable all binary downloads (for air-gapped installs)

Filesystem roots

VariableDefaultDescription
HEADROOM_CONFIG_DIR~/.headroom/configRead-mostly config root. Derives models.json and plugin config paths.
HEADROOM_WORKSPACE_DIR~/.headroomRead-write state root. Derives savings, memory DB, logs, TOIN, and more.
HEADROOM_BASE_URLhttp://localhost:8787TypeScript SDK: proxy base URL
HEADROOM_API_KEYTypeScript SDK: optional API key for authenticated endpoints
Set HEADROOM_CONTEXT_TOOL=lean-ctx before headroom wrap to use lean-ctx for CLI context filtering instead of RTK. Both tools are supported; RTK is the default.

Provider-Specific Settings

from headroom import OpenAIProvider

provider = OpenAIProvider(
    enable_prefix_caching=True,
)

Custom Model Configuration

Configure context limits and pricing for new or fine-tuned models. Save as ~/.headroom/config/models.json or point HEADROOM_MODEL_LIMITS at a JSON string or file path:
{
  "anthropic": {
    "context_limits": {
      "claude-4-opus-20250301": 200000,
      "claude-custom-finetune": 128000
    },
    "pricing": {
      "claude-4-opus-20250301": {
        "input": 15.00,
        "output": 75.00,
        "cached_input": 1.50
      }
    }
  },
  "openai": {
    "context_limits": {
      "gpt-5": 256000,
      "ft:gpt-4o:my-org": 128000
    }
  }
}
Unknown models are automatically inferred from naming patterns:
PatternInferred contextTier
*opus*200KOpus pricing
*sonnet*200KSonnet pricing
*haiku*200KHaiku pricing
gpt-4o*128KGPT-4o pricing
o1*, o3*200KReasoning model pricing

Configuration Precedence

Settings are applied in this order (later overrides earlier):
1

Built-in defaults

Hardcoded values from HeadroomConfig and SmartCrusherConfig dataclass definitions.
2

Environment variables

HEADROOM_* variables read at process start. Applied before any SDK constructor call.
3

SDK constructor arguments

Values passed to HeadroomClient(...) or HeadroomConfig(...).
4

Per-request overrides

headroom_mode=, headroom_output_buffer_tokens=, headroom_tool_profiles=, etc. passed directly to the completion call.

Build docs developers (and LLMs) love