Headroom’s CCR (Compress-Cache-Retrieve) architecture makes every compression operation fully reversible — a core design principle that separates Headroom from every other context compression tool. When content is compressed, the original is stored in a local cache keyed by a short hash. If the LLM determines the compressed view is insufficient, it calls the injectedDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
headroom_retrieve tool and gets the full original back — transparently, in the same request, without any code changes on your side.
Nothing is ever permanently lost. CCR guarantees that every piece of original data remains accessible within the configured TTL. You get 70–90% token savings with zero risk of permanent data loss.
The Problem CCR Solves
Traditional lossy compression forces an uncomfortable tradeoff: compress aggressively and risk losing data the LLM needs, or compress conservatively and leave most of the savings on the table. CCR eliminates that tradeoff entirely — compress aggressively, retrieve on demand.The Four-Phase Architecture
Phase 1: Compression Store
When SmartCrusher or ContentRouter compresses a tool output, two things happen simultaneously:- The original content is stored in an LRU cache (
CompressionStore) keyed by a deterministic hash. - A retrieval marker is appended to the compressed output so the LLM knows a fuller version exists.
CCRConfig.marker_template):
Phase 2: Tool Injection
Headroom injects theheadroom_retrieve tool into every request where compression occurred. The tool appears alongside your application’s existing tools, so the LLM can call it naturally:
headroom mcp install), tool injection is automatically skipped to avoid duplicating the tool definition — the MCP server exposes headroom_retrieve directly instead.
Phase 3: Response Handler
When the LLM decides to callheadroom_retrieve, the proxy’s CCRResponseHandler intercepts the tool call before it ever reaches your application code:
- The tool call is detected in the streaming or non-streaming response.
- The
CompressionStoreis queried by hash (local lookup, ~1ms). - The full original content is returned as the tool result.
- The API call continues automatically — the LLM processes the full data and produces a final response.
Phase 4: Context Tracker
Across multi-turn conversations, the ContextTracker maintains awareness of all compressed content in the session. It tracks:- What was compressed in earlier turns (hash → content mapping)
- Which content types and queries are active in the current turn
ExpansionRecommendationsignals for proactively expanding relevant compressed data before the LLM has to ask
CCR with MCP vs. Proxy Injection
- Proxy mode (default)
- MCP mode
- Library mode
The proxy automatically injects Check the live TTL and cache stats:
headroom_retrieve and intercepts tool calls. No setup required — just run headroom proxy and point your client at it.CCR Configuration
CCR is enabled by default. Fine-grained control is available viaCCRConfig in HeadroomConfig:
CCR-Enabled Components
| Component | What it caches | Marker format |
|---|---|---|
| SmartCrusher | JSON arrays (tool outputs) | [N items compressed to K. Retrieve more: hash=…] |
| ContentRouter | Code, logs, search results, text | Per-strategy hash marker |
Why Reversibility Matters
| Approach | Data loss risk | Token savings |
|---|---|---|
| No compression | None | 0% |
| Traditional lossy compression | Permanent | 70–90% |
| CCR compression | None (reversible) | 70–90% |