Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

Headroom’s library mode is the most direct integration path: import compress, pass your message list, and get back a compressed copy alongside token-savings metrics — no proxy process, no infrastructure changes, and no dependency on a running server. The Python package runs the full pipeline locally; the TypeScript SDK delegates to a local proxy HTTP endpoint.

Installation

# Python — ships the headroom CLI and full compression pipeline
pip install "headroom-ai[all]"

# TypeScript SDK (library only — no headroom CLI)
npm install headroom-ai

One-function API

Import compress directly from the headroom package. It accepts any list of messages in Anthropic or OpenAI format and returns a CompressResult.
from headroom import compress

messages = [
    {"role": "user", "content": "Analyze this output"},
    {"role": "tool", "content": big_tool_output},
]

result = compress(messages, model="gpt-4o")

# Use the compressed messages with any client
print(result.messages)           # same structure, fewer tokens
print(result.tokens_saved)       # e.g. 11 500
print(result.compression_ratio)  # e.g. 0.65  (65% removed)
print(result.tokens_before)      # original token count
print(result.tokens_after)       # post-compression token count
print(result.transforms_applied) # e.g. ["router:smart_crusher:0.35"]
Pass the compressed messages straight into any SDK:
from anthropic import Anthropic
from headroom import compress

client = Anthropic()
messages = [{"role": "user", "content": huge_tool_output}]

compressed = compress(messages, model="claude-sonnet-4-5-20250929")
response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=compressed.messages,
)
LiteLLM and any HTTP client work the same way — just replace compressed.messages wherever you build the request body.
import litellm
from headroom import compress

messages = [...]
compressed = compress(messages, model="bedrock/claude-sonnet")
response = litellm.completion(
    model="bedrock/claude-sonnet",
    messages=compressed.messages,
)

CompressConfig / CompressOptions parameters

Pass a CompressConfig instance to compress() for fine-grained control, or use keyword arguments as shorthand — they override the config object.
from headroom import compress
from headroom.compress import CompressConfig

# Financial document: compress everything, keep 50 % of tokens
result = compress(
    messages,
    model="claude-opus-4-20250514",
    config=CompressConfig(
        compress_user_messages=True,
        target_ratio=0.5,
        protect_recent=0,
    ),
)

# Aggressive log compression
result = compress(messages, model="gpt-4o", target_ratio=0.2)

# Conservative: protect the last 8 messages, skip ML compression
result = compress(
    messages,
    model="gpt-4o",
    protect_recent=8,
    kompress_model="disabled",
)
FieldDefaultDescription
compress_user_messagesFalseCompress user messages too (default: skip for coding agents)
compress_system_messagesTrueCompress system messages
protect_recent4Don’t compress the last N messages
protect_analysis_contextTrueDetect analyze/review intent and protect code
target_ratioNoneKeep ratio for Kompress text compression (e.g. 0.5 keeps 50 %)
min_tokens_to_compress250Minimum token count before a message is compressed
kompress_modelNoneOverride the Kompress HuggingFace model ID; "disabled" skips ML entirely
savings_profileNoneNamed high-savings profile, e.g. "agent-90"

HeadroomClient wrapper (Python)

HeadroomClient wraps an existing provider SDK client and automatically compresses every outbound request. It is the drop-in option when you want compression without changing call sites throughout your codebase.
from headroom.client import HeadroomClient
from headroom.providers.openai import OpenAIProvider
from openai import OpenAI

client = HeadroomClient(
    original_client=OpenAI(),
    provider=OpenAIProvider(),
    default_mode="optimize",
)

# Use exactly like the OpenAI client — compression is transparent
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

# Inspect session-level savings
stats = client.get_stats()
print(stats)

# Check that routing and compression are working
result = client.validate_setup()
print(result)

ASGI middleware

For FastAPI, Starlette, or any other ASGI application, add CompressionMiddleware to compress all LLM requests that pass through your app layer:
from fastapi import FastAPI
from headroom.integrations.asgi import CompressionMiddleware

app = FastAPI()
app.add_middleware(CompressionMiddleware)
Every request the middleware intercepts is compressed before being forwarded, and the response is returned unchanged to the caller.

Integrations quick reference

Your stackImport
Any Python appfrom headroom import compress
Any TypeScript appimport { compress } from 'headroom-ai'
Anthropic SDKwithHeadroom(new Anthropic())
OpenAI SDKwithHeadroom(new OpenAI())
Vercel AI SDKwrapLanguageModel({ model, middleware: headroomMiddleware() })
LiteLLMlitellm.callbacks = [HeadroomCallback()]
LangChainHeadroomChatModel(your_llm)
AgnoHeadroomAgnoModel(your_model)
ASGI appsapp.add_middleware(CompressionMiddleware)
Multi-agentSharedContext().put / .get

When to use library vs proxy vs agent-wrap

Library

You control the call site and want compression inline — Python or TypeScript app, LangChain chain, custom agent loop. No extra process to manage.

Proxy

You want zero code changes. Any language, any client. Point ANTHROPIC_BASE_URL or OPENAI_BASE_URL at the proxy and compression is automatic.

Agent Wrap

You use a CLI coding agent (Claude Code, Codex, Aider, etc.) and want one-command setup. headroom wrap claude starts the proxy and launches the agent with the right env vars pre-set.
Start with headroom wrap or the proxy for zero-effort savings. Switch to the library when you need programmatic control over compression config, want to embed metrics in your own observability pipeline, or are building a framework adapter.

Build docs developers (and LLMs) love