Headroom’s failure mode is often silent — when a client is not routed through the proxy, everything still works, but you stop saving tokens. Start every investigation withDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
headroom doctor, which correlates proxy liveness, client routing, version drift, and savings flow in one command.
Start Here: headroom doctor
0 = all checks passed, 1 = warnings only, 2 = at least one failure.
headroom doctor checks: proxy liveness, version drift (proxy vs. installed package), Claude routing (~/.claude/settings.json), Codex routing (~/.codex/config.toml), current shell ANTHROPIC_BASE_URL/OPENAI_BASE_URL, savings flow (lifetime totals + last activity), and budget configuration. Fix anything flagged before digging deeper.
Common Issues
Proxy not running or not routing traffic
Proxy not running or not routing traffic
headroom doctor reports proxy: ✗ fail — not reachable at http://127.0.0.1:8787.Cause: The proxy process is not running, or it started on a different port.Fix:headroom wrap <tool> handles this automatically. For manual setup:CERTIFICATE_VERIFY_FAILED during installation (SSL inspection)
CERTIFICATE_VERIFY_FAILED during installation (SSL inspection)
pip install "headroom-ai[all]" fails with:maturin) downloads rustup over a connection your TLS stack does not trust.Fix: Install Rust first so the build does not need to fetch it:pip install "headroom-ai[all]".Alternatively, use a prebuilt wheel — this avoids the Rust build entirely:win_amd64), Linux (x86_64 / aarch64), and macOS (Apple Silicon). The Rust toolchain is only required for the platform-independent sdist fallback (e.g., Intel macOS).Two runtime assets are also fetched over TLS. If they are blocked, trust your corporate CA via REQUESTS_CA_BUNDLE / SSL_CERT_FILE / CURL_CA_BUNDLE:cdn.pyke.io— ONNX Runtime for the Rust core. Pre-provide withORT_STRATEGY=systemandORT_LIB_LOCATION=/path/to/onnxruntimeto skip the download.huggingface.co— thekompress-basecompression model. Pre-download it and run withHF_HUB_OFFLINE=1, or setHF_ENDPOINTto a trusted mirror.
"Basic Constraints of CA cert not marked critical" (Zscaler / Netskope)
"Basic Constraints of CA cert not marked critical" (Zscaler / Netskope)
CERTIFICATE_VERIFY_FAILED above. Python 3.13 + OpenSSL 3.x enable VERIFY_X509_STRICT by default, which enforces RFC 5280 §4.2.1.9: a CA certificate’s basicConstraints extension must be marked critical. Inspection roots like Zscaler set CA:TRUE without the critical bit, so the chain is rejected. Adding the CA to a bundle does not help — it is already found and trusted; it just fails strict validation.Fix: Set HEADROOM_TLS_STRICT=0 to clear only the strict flag from every TLS context Headroom controls:huggingface_hub path used for model downloads. Chain validation, signature checking, certificate expiry, and hostname verification all remain enabled — this is strictly narrower than disabling verification.cdn.pyke.io) uses a separate TLS stack (rustls / OS trust store) that HEADROOM_TLS_STRICT does not affect. On Windows, the corporate root must be in the machine certificate store. Alternatively, pre-provision ONNX Runtime with ORT_STRATEGY=system + ORT_LIB_LOCATION=/path/to/onnxruntime to skip the download entirely.Python 3.14+: dollar savings show $0.00
Python 3.14+: dollar savings show $0.00
TypeScript SDK: needs proxy running
TypeScript SDK: needs proxy running
compress() calls fail or return uncompressed content. No headroom CLI command is available.Cause: The npm headroom-ai package is a library only — it does not ship a CLI or a bundled compression engine. The TypeScript SDK routes compression requests to a running Headroom proxy.Fix:-
Install the Python package to get the proxy and CLI:
-
Start the proxy:
-
Point the TypeScript SDK at it:
Or pass it in the constructor:
HEADROOM_BASE_URL automatically, so setting the environment variable is sufficient for most setups.pipx installs an older version or fails with Python version mismatch
pipx installs an older version or fails with Python version mismatch
pipx install headroom-ai or pipx upgrade headroom-ai installs an older version than what PyPI shows, or fails with No matching distribution found.Cause: pipx resolves packages inside its own virtual environment. If that environment uses a Python version that Headroom does not publish wheels for, pip skips newer releases and chooses the newest compatible build it can find.Diagnosis:Build fails: Rust toolchain required for sdist
Build fails: Rust toolchain required for sdist
pip install headroom-ai fails with a Rust/Cargo compilation error such as:maturin).Fix: Install Rust first:win_amd64, Linux x86_64 / aarch64, and Apple Silicon — only Intel macOS and unusual Linux configurations need a local Rust toolchain.Known Limitations
These behaviors are intentional, not bugs:Code compression passes through — this is intentional
Code compression passes through — this is intentional
headroom perf shows 0% savings on Read tool outputs.Why this is intentional: Headroom includes an AST-aware CodeCompressor (tree-sitter, 8 languages), but it is gated by safety protections that prevent it from firing on content you are actively working with:- Recent code protection (
protect_recent_code=4): Code in the last 4 messages is never compressed. - Analysis intent protection (
protect_analysis_context=True): If the most recent user message contains keywords likeanalyze,review,explain,fix, ordebug, all code in the conversation is protected. - Word count gate: Content under 50 words is skipped.
protect_analysis_context=False in ContentRouterConfig for aggressive code compression. Requires headroom-ai[code] for tree-sitter.grep results show 0% compression — this is intentional
grep results show 0% compression — this is intentional
Grep tool outputs show 0% compression in headroom perf.Why this is intentional: grep results are already a compact structured format — each line is a file path, line number, and matched text. There is no JSON array structure to statistically sample, no redundant boilerplate to strip, and no safe way to drop results without risking that the dropped line was the one the agent needed.SmartCrusher’s minimum-items threshold (min_items_to_analyze=5) and token threshold (min_tokens_to_crush=200) also protect grep results from lossy compression.Grep is in the default exclude_tools list (DEFAULT_EXCLUDE_TOOLS) alongside Read, Glob, Write, and Edit. These tools return exact content the agent needs for edits — compressing them would break the edit workflow.To protect additional tools, use --protect-tool-results:HEADROOM_PROTECT_TOOL_RESULTS=Bash,WebFetch.Other Known Issues
Claude Code context window grows larger through the proxy
Claude Code context window grows larger through the proxy
ANTHROPIC_BASE_URL), /context all shows more tokens used than a direct session — especially in the “System tools” and “MCP tools” lines.Cause: Claude Code normally defers most tool schemas behind its server-side Tool Search Tool — it sends only tool names and loads full schemas on demand. It enables this only when it believes it is talking directly to api.anthropic.com. With a custom ANTHROPIC_BASE_URL, Claude Code falls back to eagerly materializing every tool schema into the local context window. This happens client-side before the request reaches the proxy.Fix: Set ENABLE_TOOL_SEARCH so Claude Code keeps deferring tools through the proxy:Windows: ML content detection hangs or silently falls back
Windows: ML content detection hangs or silently falls back
magika ONNX session init timed out. Compression quality may drop.Cause: Without ORT_DYLIB_PATH pinned, the Windows DLL search resolves onnxruntime.dll to the Windows ML OS component (C:\Windows\System32\onnxruntime.dll, version 1.17.x), which deadlocks ONNX session initialization.Fix: Headroom pins ORT_DYLIB_PATH automatically at import time to the DLL inside the onnxruntime pip package. Confirm in the startup log:onnxruntime):HEADROOM_MAGIKA_INIT_TIMEOUT_SECS (default 5) bounds the init as a safety net — on timeout, detection degrades to non-ML tiers for the process lifetime.Getting Help
Discord Community
GitHub Issues
- Headroom version:
headroom --version - Python version:
python --version - Provider: Anthropic, OpenAI, Bedrock, etc.
headroom doctor --jsonoutput (redact API keys)- Debug log output — run with
headroom proxy --log-file ~/.headroom/logs/proxy.jsonl --log-messagesand include the relevant lines - Minimal reproduction — the smallest message array that triggers the issue