Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

Headroom integrates with the Anthropic SDK through withHeadroom(), a single wrapper that intercepts messages.create() calls and compresses conversation history before they reach Claude. The adapter handles full Anthropic message format including content blocks, tool use, and tool results — the conversion is lossless, so your request and response behave identically to an unwrapped client.

Installation

pip install "headroom-ai"

Quick start

Use HeadroomClient with AnthropicProvider to wrap your Anthropic instance. AnthropicProvider enables accurate token counting against Claude’s exact context limits:
from anthropic import Anthropic
from headroom import HeadroomClient, AnthropicProvider

client = HeadroomClient(Anthropic(), provider=AnthropicProvider())

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=long_conversation,
    max_tokens=1024,
)
You can also use compress() directly before passing messages to the Anthropic client:
from anthropic import Anthropic
from headroom import compress

anthropic = Anthropic()

messages = [{"role": "user", "content": large_content}]
compressed = compress(messages, model="claude-sonnet-4-5-20250929")

response = anthropic.messages.create(
    model="claude-sonnet-4-5-20250929",
    messages=compressed.messages,
    max_tokens=1024,
)

print(f"Saved {compressed.tokens_saved} tokens")

How it works

withHeadroom() returns a proxy around your Anthropic client that intercepts messages.create():
1

Convert to OpenAI format

Converts Anthropic-format messages to OpenAI format (the compression engine’s native format).
2

Compress

Runs the Headroom pipeline against the messages and target model’s context limit.
3

Convert back and forward

Converts the compressed messages back to Anthropic format, then forwards the full request to Anthropic as normal.

Message format conversion

The adapter handles the full Anthropic message format including content blocks:
Anthropic formatOpenAI format
{ type: "text", text: "..." }{ role: "user", content: "..." }
{ type: "tool_use", id, name, input }{ tool_calls: [{ id, function: { name, arguments } }] }
{ type: "tool_result", tool_use_id, content }{ role: "tool", tool_call_id, content }

Options

Pass compression options as the second argument:
import { withHeadroom } from 'headroom-ai/anthropic';
import Anthropic from '@anthropic-ai/sdk';

const client = withHeadroom(new Anthropic(), {
  model: 'claude-sonnet-4-5-20250929',
  baseUrl: 'http://localhost:8787',
});

Streaming

Compression happens before the request is sent, so streaming responses work exactly as normal:
import { withHeadroom } from 'headroom-ai/anthropic';
import Anthropic from '@anthropic-ai/sdk';

const client = withHeadroom(new Anthropic());

const stream = await client.messages.create({
  model: 'claude-sonnet-4-5-20250929',
  messages: longConversation,
  max_tokens: 1024,
  stream: true,
});

Tool use

Tool results are where compression has the biggest impact. Large JSON payloads from tool calls are compressed automatically:
import { withHeadroom } from 'headroom-ai/anthropic';
import Anthropic from '@anthropic-ai/sdk';

const client = withHeadroom(new Anthropic());

const response = await client.messages.create({
  model: 'claude-sonnet-4-5-20250929',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: 'What went wrong?' },
    {
      role: 'assistant',
      content: [
        {
          type: 'tool_use',
          id: 'toolu_1',
          name: 'get_logs',
          input: { service: 'api' },
        },
      ],
    },
    {
      role: 'user',
      content: [
        {
          type: 'tool_result',
          tool_use_id: 'toolu_1',
          content: hugeLogOutput, // Compressed automatically
        },
      ],
    },
  ],
  tools: [
    {
      name: 'get_logs',
      description: 'Get service logs',
      input_schema: { type: 'object', properties: {} },
    },
  ],
});

Cache optimization

Headroom’s CacheAligner stabilizes prompt prefixes so Anthropic’s prompt-caching KV cache actually hits on repeated calls. When compression rearranges message content, CacheAligner ensures the stable prefix (system prompt, earlier turns) is positioned consistently to maximize cache hit rate.
The AnthropicProvider used by Headroom knows Claude’s exact context limits per model: 200 000 tokens for claude-3-5-sonnet and above, 100 000 for claude-3-haiku. This means compression only activates when you actually need it.

Supported models

claude-opus-4, claude-sonnet-4-5-20250929, claude-haiku-3-5, and all claude-3 variants. Context limits are auto-detected per model ID.

Prompt caching

CacheAligner keeps your stable prefixes pinned so Anthropic’s prompt cache keeps hitting even after Headroom compresses later turns.

Build docs developers (and LLMs) love