Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

headroom proxy starts the Headroom optimization proxy — an HTTP server that sits between your AI agent and the upstream LLM provider. Every request that passes through it has its message context compressed before being forwarded, reducing token spend and latency with no changes to your agent code.
# Start on the default port 8787
headroom proxy

# OpenAI-compatible clients
OPENAI_BASE_URL=http://localhost:8787/v1 your-app

# Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude
For one-command setup that starts the proxy AND launches an agent, use headroom wrap <tool> instead.

Core options

--host
string
default:"127.0.0.1"
Host to bind the server to. Set to 0.0.0.0 for Docker/remote access. Env: HEADROOM_HOST.
--port / -p
integer
default:"8787"
Port to bind to (1–65535). Env: HEADROOM_PORT.
--workers
integer
default:"1"
Number of Uvicorn worker processes. Increase for high-concurrency deployments. Env: HEADROOM_WORKERS.
--mode
token | cache
Optimization mode. token (default) prioritizes maximum token reduction; prior turns may be rewritten. cache freezes prior turns to maximize provider prefix-cache hit rates. Legacy aliases token_mode, cache_mode, token_savings, cost_savings, and token_headroom are still accepted. Env: HEADROOM_MODE.
--no-optimize
flag
Disable all compression — run as a passthrough proxy only. Useful for debugging or measuring baseline usage.
--no-cache
flag
Disable the semantic response cache.
--target-ratio
float
Override the Kompress keep-ratio for prose/code compression. Lower is more aggressive (e.g. 0.4 keeps ~40 % of tokens). Unset by default — Kompress decides via its own importance threshold. Env: HEADROOM_TARGET_RATIO.

Logging

--log-file
string
Path to write request/response logs as JSONL. Each line is a JSON object with fields: timestamp, request_id, model, tokens_before, tokens_after, latency_ms, etc. Disabled in --stateless mode. Env: HEADROOM_LOG_FILE.
--log-messages
flag
Enable full message logging — request/response content is stored in the log file. Warning: may log sensitive data. Env: HEADROOM_LOG_MESSAGES.

Budget & rate limiting

--budget
float
Spend cap in USD per --budget-period. Requests are rejected with HTTP 429 once the limit is reached. Env: HEADROOM_BUDGET.
--budget-period
hourly | daily | monthly
default:"daily"
Period the --budget limit applies to. hourly resets on a rolling hour; daily at local midnight; monthly on the 1st. Env: HEADROOM_BUDGET_PERIOD.
--no-rate-limit
flag
Disable per-minute rate limiting entirely.
--rpm
integer
Max requests per minute. Default: 60. Has no effect with --no-rate-limit. Env: HEADROOM_RPM.
--tpm
integer
Max tokens per minute. Default: 100,000. Has no effect with --no-rate-limit. Env: HEADROOM_TPM.

Upstream provider routing

--openai-api-url
string
Custom OpenAI API URL for passthrough endpoints. Env: OPENAI_TARGET_API_URL.
--anthropic-api-url
string
Custom Anthropic API URL for passthrough endpoints. Env: ANTHROPIC_TARGET_API_URL.
--gemini-api-url
string
Custom Gemini API URL for passthrough endpoints. Env: GEMINI_TARGET_API_URL.
--backend
string
default:"anthropic"
API backend: anthropic (direct), bedrock (AWS), openrouter, anyllm, or litellm-<provider> (e.g. litellm-vertex). Env: HEADROOM_BACKEND.
--bedrock-api-url
string
Custom Bedrock InvokeModel upstream. Point at a re-signing gateway, not raw AWS. Env: BEDROCK_TARGET_API_URL.
--region
string
default:"us-west-2"
Cloud region for Bedrock/Vertex/etc. Env: HEADROOM_REGION.

CCR (Compress-Cache-Retrieve)

--no-ccr-inject-tool
flag
Don’t inject the headroom_retrieve tool into requests. Use for streaming or non-MCP clients that can’t resolve the retrieve tool. Env: HEADROOM_NO_CCR_INJECT_TOOL.
--no-ccr-marker
flag
Don’t add CCR retrieval markers to compressed content. Env: HEADROOM_NO_CCR_MARKER.
--lossless
flag
No-CCR lossless mode: compress tool outputs with format-native lossless compaction without emitting any CCR retrieval marker or needing the MCP retrieve tool. Env: HEADROOM_LOSSLESS=1.
--intercept-tool-results
flag
Opt in to tool-result interceptors (AST-grep Read outliner, etc.). Off by default. Requires headroom-ai[tools] extras.

Kompress & tool protection

--disable-kompress
flag
Disable Kompress ML compression while keeping structural compression (ToolCrusher, SmartCrusher, CacheAligner) active. Env: HEADROOM_DISABLE_KOMPRESS=1.
--protect-tool-results
string
Comma-separated tool names whose results are never lossy-compressed. Merged with built-in defaults (e.g. Bash,WebFetch). Env: HEADROOM_PROTECT_TOOL_RESULTS.
--no-ccr-proactive-expansion
flag
Disable proactive expansion of previously compressed content. Env: HEADROOM_NO_CCR_PROACTIVE_EXPANSION.

Read lifecycle

--no-read-lifecycle
flag
Disable Read lifecycle management. By default the proxy compresses stale and superseded Read tool outputs to reclaim context.

Code-aware compression

--code-aware / --no-code-aware
flag
Enable or disable AST-based code compression. Requires pip install headroom-ai[code]. Default: disabled. Env: HEADROOM_CODE_AWARE_ENABLED=1.
--code-graph
flag
Index the current working directory and watch for file changes via codebase-memory-mcp. Useful when the proxy is started from a project root.

Memory

--memory
flag
Enable persistent memory. Auto-detects the provider (Anthropic, OpenAI, Gemini) and uses the appropriate tool format. By default each workspace gets its own SQLite database.
--memory-storage
project | user | global
default:"project"
Memory partitioning strategy. project (default): one DB per resolved workspace. user: one DB per x-headroom-user-id. global: a single shared DB (pre-existing behavior).
--memory-top-k
integer
default:"10"
Number of semantically relevant memories to inject as context (1–100). Env: HEADROOM_MEMORY_TOP_K.
--no-memory-tools
flag
Disable automatic injection of memory_save/memory_search tools. Env: HEADROOM_NO_MEMORY_TOOLS.
--no-memory-context
flag
Disable automatic injection of relevant past memories into the system prompt. Env: HEADROOM_NO_MEMORY_CONTEXT.

Traffic learning

--learn
flag
Enable live traffic learning: extract error→recovery patterns, environment facts, and user preferences from proxy traffic. Implies --memory. Learned patterns are saved to agent-native memory files (CLAUDE.md, .cursor/rules, AGENTS.md).
--no-learn
flag
Explicitly disable traffic learning even when --memory is set.
--min-evidence
integer
Minimum number of times a pattern must be observed before it is persisted. Default: 5. Higher values reduce one-shot noise. Env: HEADROOM_MIN_EVIDENCE.

Connection tuning

--limit-concurrency
integer
default:"1000"
Maximum concurrent connections before Uvicorn returns 503. Env: HEADROOM_LIMIT_CONCURRENCY.
--max-connections
integer
default:"500"
Maximum upstream HTTP connections. Env: HEADROOM_MAX_CONNECTIONS.
--max-keepalive
integer
default:"100"
Maximum upstream keep-alive connections. Env: HEADROOM_MAX_KEEPALIVE.
--request-timeout-seconds
integer
default:"300"
Request timeout in seconds. Useful for slow local providers. Env: HEADROOM_REQUEST_TIMEOUT.
--retry-max-attempts
integer
default:"3"
Maximum upstream retry attempts for connect/read/5xx failures (1–10). Env: HEADROOM_RETRY_MAX_ATTEMPTS.

Deployment & security

--telemetry
flag
Opt in to anonymous usage telemetry (off by default). Env: HEADROOM_TELEMETRY=on.
--no-telemetry
flag
Explicitly disable telemetry. Env: HEADROOM_TELEMETRY=off.
--stateless
flag
Disable all filesystem writes — run purely in-memory. For containerized, read-only, or load-balanced deployments. Memory, TOIN, and log file persistence are all disabled. Env: HEADROOM_STATELESS=true.
When binding to a non-loopback address (e.g. --host 0.0.0.0) without setting HEADROOM_PROXY_TOKEN, all /v1/* endpoints are unauthenticated. Always set an inbound token in network-exposed deployments.

Examples

# Start with memory and traffic learning on port 9000
headroom proxy --port 9000 --memory --learn

# AWS Bedrock backend, us-east-1
headroom proxy --backend bedrock --region us-east-1

# Stateless container deployment
headroom proxy --stateless --host 0.0.0.0 --workers 4

# Aggressive token savings, cache mode, with budget
headroom proxy --mode cache --budget 20 --budget-period daily

# OpenRouter backend
headroom proxy --backend openrouter

# Disable Kompress ML compression (structural compression only)
headroom proxy --disable-kompress

# Code-aware AST compression
headroom proxy --code-aware --code-graph

Build docs developers (and LLMs) love