Register headroom_compress, headroom_retrieve, and headroom_stats with Claude Code or any MCP-compatible host using a single headroom mcp install command.
Use this file to discover all available pages before exploring further.
Headroom’s MCP server exposes compression, retrieval, and observability as tools that any MCP-compatible AI coding tool can call. Install it once and Claude Code (or any other MCP host) can compress large content on demand, retrieve originals by hash, and inspect live session stats — no proxy required for basic use.
# Register with Claude Code (one-time)headroom mcp install# Start Claude Code — it now has headroom tools availableclaude
Claude Code can now call headroom_compress, headroom_retrieve, and headroom_stats on demand.For automatic compression of all traffic (not just on-demand), also run the proxy:
# Terminal 1headroom proxy# Terminal 2ANTHROPIC_BASE_URL=http://127.0.0.1:8787 claude
MCP tool calls are namespaced by Claude Code as mcp__headroom__headroom_compress, mcp__headroom__headroom_retrieve, and mcp__headroom__headroom_stats. The headroom doubling is normal MCP namespacing — not a bug. Compression markers in proxy output reference the bare tool name headroom_retrieve.
Compress content on demand. The LLM calls this when it wants to shrink large content before reasoning over it — file listings, grep results, JSON blobs, or any large tool output.Parameters:
Parameter
Required
Description
content
✅
Text to compress (files, JSON, logs, search results)
Returns:
Field
Description
compressed
Compressed text
hash
Key for retrieving the original later via headroom_retrieve
original_tokens
Token count of the input
compressed_tokens
Token count of the compressed output
savings_percent
Percentage of tokens removed
transforms
Which compression algorithms were applied
Example invocation flow:
Claude: Let me compress this large output to save context space.→ headroom_compress(content="[5000 lines of grep results...]")← { "compressed": "[key matches with context...]", "hash": "a1b2c3d4e5f6...", "original_tokens": 12000, "compressed_tokens": 3200, "savings_percent": 73.3, "transforms": ["router:search:0.27"] }
The original is stored locally for 1 hour. If Claude needs the full content later, it calls headroom_retrieve with the returned hash.
Retrieve original uncompressed content by hash. Retrieval checks the local store first, then falls back to the proxy’s store — hashes from either source work transparently.Parameters:
Parameter
Required
Description
hash
✅
Hash key from a previous headroom_compress call or from a proxy compression marker
Returns:
Field
Description
original_content
Full original content
source
"local" or "proxy" — where the content was retrieved from
original_item_count
Number of items in the original (for array content)
compressed_item_count
Number of items in the compressed form
retrieval_count
How many times this hash has been retrieved
Example:
Claude: I need the full file listing from earlier.→ headroom_retrieve(hash="a1b2c3d4e5f6...")← { "original_content": "[5000 lines of grep results...]", "source": "local" }
# Install into every detected agent (Claude Code, Cursor, Codex, ...)headroom mcp install# Install with a custom proxy URLheadroom mcp install --proxy-url http://localhost:9000# Overwrite an existing configurationheadroom mcp install --force# Restrict to a specific agentheadroom mcp install --agent claude# Check installation status and proxy reachabilityheadroom mcp status# Uninstall from all registered agentsheadroom mcp uninstall# Start the MCP server manually (useful for debugging)headroom mcp serveheadroom mcp serve --debugheadroom mcp serve --proxy-url http://127.0.0.1:8787
On-demand compression. Claude decides when to call headroom_compress on large content it has already received. Good for Claude Code subscription users without API key access who still want CCR (Compress–Cache–Retrieve).
headroom mcp installclaude
Proxy
Automatic compression. Every request routed through the proxy is compressed before the LLM ever sees the content. Covers all traffic, all tools, all models — not just content Claude explicitly decides to compress.
headroom proxyANTHROPIC_BASE_URL=http://127.0.0.1:8787 claude
The two work together without conflict. When both are active, the proxy compresses HTTP-level traffic and the MCP tools handle on-demand compression of content the LLM already holds. headroom_retrieve checks the local MCP store first, then falls back to the proxy’s store.
For any MCP host that lets you configure a local stdio server, point it at headroom mcp serve. Pass the proxy URL explicitly if you also run the proxy.
"MCP SDK not installed" — Run pip install "headroom-ai[mcp]"."Proxy not running" — Start the proxy with headroom proxy in a separate terminal. Only needed for proxy-backed retrieval and stats."Entry not found or expired" — Local content expires after 1 hour; proxy content after 5 minutes.Claude doesn’t see headroom tools — Run headroom mcp status, restart Claude Code, and verify with /mcp inside Claude Code.command: "headroom" fails to start — The headroom executable must be on the PATH your MCP host sees at startup. If you installed into a project virtualenv, install Headroom globally instead:
Alternatively, replace "headroom" in the MCP config with the absolute path to the binary (command -v headroom on macOS/Linux, where headroom on Windows).
Claude Code’s /usage command may attribute a visible share of session tokens to the headroom MCP server in long-running or subagent-heavy workflows. This reflects MCP tool call/result context being kept in the active window — not direct overhead. Run headroom_stats to compare tokens_saved against MCP call count, and use /compact after large investigation steps to clear old MCP results from the active context.