Headroom is the context compression layer for AI agents. It sits between your application and the LLM provider, compressing everything the model reads — tool outputs, logs, RAG results, files, and conversation history — before it reaches the LLM. You get the same answers at a fraction of the token cost.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Get from zero to compressed LLM calls in under 5 minutes.
Installation
Install via pip, npm, or Docker with the right extras for your stack.
How Compression Works
SmartCrusher, CodeCompressor, Kompress, and ContentRouter explained.
API Reference
Full Python SDK, TypeScript SDK, CLI, and Proxy HTTP API reference.
Pick your integration path
Library
Call
compress(messages) in Python or TypeScript. Drop into any LLM app, no infra required.Proxy
Run
headroom proxy and point any existing client at it. Zero code changes.Agent Wrap
One command wraps Claude Code, Codex, Cursor, Aider, Cline, and more.
MCP Server
Install as an MCP tool for Claude Code, Cursor, or any MCP-compatible host.
Real savings on real workloads
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Key features
Reversible Compression (CCR)
Originals are cached locally. The LLM calls
headroom_retrieve when it needs the full data — nothing is permanently lost.ContentRouter
Auto-detects JSON, code, logs, plain text, and images — routes each to the best compressor automatically.
Persistent Memory
Hierarchical, temporal memory across conversations and agents. Zero extra latency — extraction happens inline.
Failure Learning
headroom learn mines past sessions and writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md.Cache Optimization
CacheAligner stabilizes prefixes so Anthropic and OpenAI KV caches actually hit on repeated calls.
Output Token Reduction
Verbosity steering and effort routing also cut what the model writes back — at 5× the per-token cost on Opus-class models.
Integrations
Headroom works with every major Python and TypeScript LLM framework:OpenAI SDK
withHeadroom(new OpenAI())Anthropic SDK
withHeadroom(new Anthropic())LangChain
HeadroomChatModelVercel AI SDK
headroomMiddleware()Agno
HeadroomAgnoModelLiteLLM
HeadroomCallback()Headroom is local-first. Your data never leaves your machine. The compression pipeline runs entirely on your hardware using local models and a local SQLite store.