Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

Headroom is the context compression layer for AI agents. It sits between your application and the LLM provider, compressing everything the model reads — tool outputs, logs, RAG results, files, and conversation history — before it reaches the LLM. You get the same answers at a fraction of the token cost.

Quickstart

Get from zero to compressed LLM calls in under 5 minutes.

Installation

Install via pip, npm, or Docker with the right extras for your stack.

How Compression Works

SmartCrusher, CodeCompressor, Kompress, and ContentRouter explained.

API Reference

Full Python SDK, TypeScript SDK, CLI, and Proxy HTTP API reference.

Pick your integration path

Library

Call compress(messages) in Python or TypeScript. Drop into any LLM app, no infra required.

Proxy

Run headroom proxy and point any existing client at it. Zero code changes.

Agent Wrap

One command wraps Claude Code, Codex, Cursor, Aider, Cline, and more.

MCP Server

Install as an MCP tool for Claude Code, Cursor, or any MCP-compatible host.

Real savings on real workloads

WorkloadBeforeAfterSavings
Code search (100 results)17,7651,40892%
SRE incident debugging65,6945,11892%
GitHub issue triage54,17414,76173%
Codebase exploration78,50241,25447%
Accuracy is preserved. GSM8K math benchmark: ±0.000 delta. SQuAD v2 QA: 97% at 19% compression. BFCL tool-use: 97% at 32% compression.

Key features

Reversible Compression (CCR)

Originals are cached locally. The LLM calls headroom_retrieve when it needs the full data — nothing is permanently lost.

ContentRouter

Auto-detects JSON, code, logs, plain text, and images — routes each to the best compressor automatically.

Persistent Memory

Hierarchical, temporal memory across conversations and agents. Zero extra latency — extraction happens inline.

Failure Learning

headroom learn mines past sessions and writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md.

Cache Optimization

CacheAligner stabilizes prefixes so Anthropic and OpenAI KV caches actually hit on repeated calls.

Output Token Reduction

Verbosity steering and effort routing also cut what the model writes back — at 5× the per-token cost on Opus-class models.

Integrations

Headroom works with every major Python and TypeScript LLM framework:

OpenAI SDK

withHeadroom(new OpenAI())

Anthropic SDK

withHeadroom(new Anthropic())

LangChain

HeadroomChatModel

Vercel AI SDK

headroomMiddleware()

Agno

HeadroomAgnoModel

LiteLLM

HeadroomCallback()
Headroom is local-first. Your data never leaves your machine. The compression pipeline runs entirely on your hardware using local models and a local SQLite store.

Build docs developers (and LLMs) love