Use this file to discover all available pages before exploring further.
This guide takes you from a fresh install to a compressed LLM call with measured savings. You will install the package, compress a realistic message thread containing a large tool output, send the result to your LLM, and inspect how many tokens were removed. If you prefer zero code changes, the final section covers proxy mode — point any existing client at http://localhost:8787 and compression happens automatically.
This installs the headroom CLI, the compress() library function, the local proxy, the Kompress-v2-base model, and all compressors. Requires Python 3.10+.
Prefer pipx or uv? Use an explicit Python 3.13 interpreter to unlock the full savings dashboard (the Proxy $ Saved tile requires LiteLLM, which does not yet support Python 3.14+):
The TypeScript SDK is a library you import — it does not include the headroom CLI. The SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the SDK:
Pass your message list to compress(). Headroom returns the same list in the same format, with tool outputs, logs, and repeated content stripped down to their essential information.
Python
TypeScript
from headroom import compressimport jsonmessages = [ {"role": "system", "content": "You analyze search results."}, {"role": "user", "content": "Search for Python tutorials."}, { "role": "assistant", "content": None, "tool_calls": [{ "id": "call_1", "type": "function", "function": {"name": "search", "arguments": '{"q": "python"}'}, }], }, { "role": "tool", "tool_call_id": "call_1", "content": json.dumps({ "results": [ {"title": f"Result {i}", "snippet": f"Description {i}", "score": 100 - i} for i in range(500) ] }), }, {"role": "user", "content": "What are the top 3 results?"},]result = compress(messages, model="gpt-4o")
import { compress } from 'headroom-ai';const messages = [ { role: 'system' as const, content: 'You analyze search results.' }, { role: 'user' as const, content: 'Search for Python tutorials.' }, { role: 'assistant' as const, content: null, tool_calls: [{ id: 'call_1', type: 'function' as const, function: { name: 'search', arguments: '{"q": "python"}' }, }], }, { role: 'tool' as const, tool_call_id: 'call_1', content: JSON.stringify({ results: Array.from({ length: 500 }, (_, i) => ({ title: `Result ${i}`, snippet: `Description ${i}`, score: 100 - i, })), }), }, { role: 'user' as const, content: 'What are the top 3 results?' },];const result = await compress(messages, { model: 'gpt-4o', baseUrl: 'http://localhost:8787',});
Use result.messages exactly as you would the originals. The compressed messages are in the same format — you do not need to change any other part of your call.
The compression_ratio field expresses the fraction of tokens removed, not the fraction kept. A value of 0.9 means 90% of tokens were eliminated. A value of 0.35 means 65% were saved (1 - 0.35).
If you do not want to modify any existing code, run Headroom as a local HTTP proxy and point your client at it. Every request flows through the compression pipeline automatically.
# Start the proxyheadroom proxy --port 8787# Point Claude Code at itANTHROPIC_BASE_URL=http://localhost:8787 claude# Any OpenAI-compatible clientOPENAI_BASE_URL=http://localhost:8787/v1 your-app
Check cumulative savings at any time:
curl http://localhost:8787/stats# {"requests_total": 42, "tokens_saved_total": 125000, ...}headroom perf # pretty-printed savings summaryheadroom dashboard # open live dashboard in browser
To wrap a coding agent in one command (starts the proxy and injects the correct environment):
headroom wrap claude # Claude Codeheadroom wrap codex # OpenAI Codexheadroom wrap aider # Aiderheadroom wrap cursor # Cursor (prints base URLs for manual setup)
Headroom auto-detects content type and routes each block to the best compressor. No configuration is needed — the biggest savings come automatically from tool outputs, which are almost always over-verbose JSON or log files.
Content type
Compressor
Typical savings
JSON arrays
SmartCrusher
70–90%
Source code
CodeCompressor
40–70%
Build / test logs
LogCompressor
80–95%
Search results
SearchCompressor
60–80%
Plain text
Kompress
30–50%
Messages shorter than 250 tokens are left unchanged by default. This threshold is configurable via CompressConfig(min_tokens_to_compress=...) — lower it for voice agents with short turns.