Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

Headroom integrates with the OpenAI SDK through a single wrapper function — withHeadroom() — that intercepts chat.completions.create() calls and compresses messages before they reach OpenAI. All other methods (embeddings, images, audio) pass through unchanged, so your existing code keeps working without modification.

Installation

pip install "headroom-ai"

Quick start

In Python, use HeadroomClient to wrap your OpenAI instance. The resulting client has the same API surface as the native OpenAI client:
from openai import OpenAI
from headroom import HeadroomClient, OpenAIProvider

client = HeadroomClient(OpenAI(), provider=OpenAIProvider())

# Messages are compressed automatically before sending
response = client.chat.completions.create(
    model="gpt-4o",
    messages=long_conversation,
)
You can also use compress() directly before passing messages to the OpenAI client:
from openai import OpenAI
from headroom import compress

openai_client = OpenAI()

messages = [{"role": "user", "content": large_tool_output}]
compressed = compress(messages, model="gpt-4o")

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=compressed.messages,
)

print(f"Saved {compressed.tokens_saved} tokens")

How it works

withHeadroom() returns a proxy around your OpenAI client that intercepts chat.completions.create():
1

Extract messages

Pulls the messages array from the request parameters.
2

Compress

Runs the Headroom pipeline — SmartCrusher, CodeCompressor, Kompress-v2-base — depending on content type.
3

Forward

Replaces the original messages with the compressed result and forwards the full request to OpenAI as normal.
All other client methods are untouched:
import { withHeadroom } from 'headroom-ai/openai';
import OpenAI from 'openai';

const client = withHeadroom(new OpenAI());

// These pass through unchanged
const embedding = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Hello world',
});

Options

Pass compression options as the second argument to control which model context limit to target and which proxy endpoint to use:
import { withHeadroom } from 'headroom-ai/openai';
import OpenAI from 'openai';

const client = withHeadroom(new OpenAI(), {
  model: 'gpt-4o',
  baseUrl: 'http://localhost:8787',
});

Streaming

Compression happens before the request is sent, so streaming responses work exactly as normal:
import { withHeadroom } from 'headroom-ai/openai';
import OpenAI from 'openai';

const client = withHeadroom(new OpenAI());

const stream = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: longConversation,
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Tool calling

Tool call messages and tool results are compressed like any other message content. Large tool outputs — JSON arrays, log dumps, search results — see the biggest savings (often 70–92%):
import { withHeadroom } from 'headroom-ai/openai';
import OpenAI from 'openai';

const client = withHeadroom(new OpenAI());

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'user', content: 'Search for recent errors' },
    {
      role: 'assistant',
      content: null,
      tool_calls: [
        {
          id: 'call_1',
          type: 'function',
          function: { name: 'search', arguments: '{"q":"errors"}' },
        },
      ],
    },
    {
      role: 'tool',
      tool_call_id: 'call_1',
      content: hugeJsonResult, // Compressed automatically
    },
  ],
  tools: [
    { type: 'function', function: { name: 'search', parameters: {} } },
  ],
});
Tool outputs are where Headroom delivers the most savings. JSON arrays and log files can compress by 70–92% with no loss of fidelity the model needs.

Build docs developers (and LLMs) love