Headroom’s library mode is the most direct integration path: importDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
compress, pass your message list, and get back a compressed copy alongside token-savings metrics — no proxy process, no infrastructure changes, and no dependency on a running server. The Python package runs the full pipeline locally; the TypeScript SDK delegates to a local proxy HTTP endpoint.
Installation
One-function API
- Python
- TypeScript
Import Pass the compressed messages straight into any SDK:LiteLLM and any HTTP client work the same way — just replace
compress directly from the headroom package. It accepts any list of messages in Anthropic or OpenAI format and returns a CompressResult.compressed.messages wherever you build the request body.CompressConfig / CompressOptions parameters
- Python — CompressConfig
- TypeScript — CompressOptions
Pass a
CompressConfig instance to compress() for fine-grained control, or use keyword arguments as shorthand — they override the config object.| Field | Default | Description |
|---|---|---|
compress_user_messages | False | Compress user messages too (default: skip for coding agents) |
compress_system_messages | True | Compress system messages |
protect_recent | 4 | Don’t compress the last N messages |
protect_analysis_context | True | Detect analyze/review intent and protect code |
target_ratio | None | Keep ratio for Kompress text compression (e.g. 0.5 keeps 50 %) |
min_tokens_to_compress | 250 | Minimum token count before a message is compressed |
kompress_model | None | Override the Kompress HuggingFace model ID; "disabled" skips ML entirely |
savings_profile | None | Named high-savings profile, e.g. "agent-90" |
HeadroomClient wrapper (Python)
HeadroomClient wraps an existing provider SDK client and automatically compresses every outbound request. It is the drop-in option when you want compression without changing call sites throughout your codebase.
ASGI middleware
For FastAPI, Starlette, or any other ASGI application, addCompressionMiddleware to compress all LLM requests that pass through your app layer:
Integrations quick reference
| Your stack | Import |
|---|---|
| Any Python app | from headroom import compress |
| Any TypeScript app | import { compress } from 'headroom-ai' |
| Anthropic SDK | withHeadroom(new Anthropic()) |
| OpenAI SDK | withHeadroom(new OpenAI()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LiteLLM | litellm.callbacks = [HeadroomCallback()] |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| ASGI apps | app.add_middleware(CompressionMiddleware) |
| Multi-agent | SharedContext().put / .get |
When to use library vs proxy vs agent-wrap
Library
You control the call site and want compression inline — Python or TypeScript app, LangChain chain, custom agent loop. No extra process to manage.
Proxy
You want zero code changes. Any language, any client. Point
ANTHROPIC_BASE_URL or OPENAI_BASE_URL at the proxy and compression is automatic.Agent Wrap
You use a CLI coding agent (Claude Code, Codex, Aider, etc.) and want one-command setup.
headroom wrap claude starts the proxy and launches the agent with the right env vars pre-set.