Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

Headroom integrates with the Vercel AI SDK through three patterns: a one-liner wrapper, composable middleware, and standalone message compression. All three compress messages before they reach the underlying language model, with no changes to streaming, tool calling, or structured output behavior.

Installation

npm install headroom-ai ai @ai-sdk/openai
The TypeScript SDK sends messages to a local Headroom proxy for compression. Start the proxy before using the SDK:
pip install "headroom-ai[proxy]"
headroom proxy

withHeadroom() one-liner

The simplest integration. Wraps any Vercel AI SDK language model with automatic compression and works with any provider (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, etc.):
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { generateText } from 'ai';

const model = withHeadroom(openai('gpt-4o'));

const { text } = await generateText({
  model,
  messages: [
    { role: 'user', content: 'Summarize these results...' },
  ],
});
withHeadroom() calls wrapLanguageModel + headroomMiddleware() under the hood.

headroomMiddleware() for composition

Use the middleware directly when you need to compose it with other middleware or when you want explicit control over the wrapping:
import { headroomMiddleware } from 'headroom-ai/vercel-ai';
import { wrapLanguageModel } from 'ai';
import { openai } from '@ai-sdk/openai';

const model = wrapLanguageModel({
  model: openai('gpt-4o'),
  middleware: headroomMiddleware(),
});
Pass options to control compression behavior:
import { headroomMiddleware } from 'headroom-ai/vercel-ai';

const middleware = headroomMiddleware({
  model: 'gpt-4o',
  baseUrl: 'http://localhost:8787',
});

Middleware configuration options

model

Target model ID for token counting and context-limit enforcement. Defaults to the model’s own modelId.

baseUrl

Headroom proxy URL. Defaults to http://localhost:8787.

compressVercelMessages() standalone

Compress Vercel-format messages directly without wrapping a model. Useful for custom pipelines where you want to compress before passing to any downstream consumer:
import { compressVercelMessages } from 'headroom-ai/vercel-ai';

const result = await compressVercelMessages(messages, {
  model: 'gpt-4o',
});

console.log(`Saved ${result.tokensSaved} tokens`);
// result.messages is in Vercel format, ready for the AI SDK

Streaming with streamText

Compression happens before the request is sent. Streaming responses are completely unaffected:
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

const model = withHeadroom(openai('gpt-4o'));

const result = streamText({
  model,
  messages: longConversation,
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

generateObject with compressed context

Works with structured output — the schema and output format are not affected by compression:
import { withHeadroom } from 'headroom-ai/vercel-ai';
import { openai } from '@ai-sdk/openai';
import { generateText, Output } from 'ai';
import { z } from 'zod';

const model = withHeadroom(openai('gpt-4o'));

const { output } = await generateText({
  model,
  output: Output.object({
    schema: z.object({
      summary: z.string(),
      severity: z.enum(['low', 'medium', 'high']),
    }),
  }),
  messages: largeConversationHistory,
});

How it works

1

Convert to OpenAI format

Messages are converted from Vercel’s ModelMessage[] format to OpenAI format.
2

Compress via proxy

Headroom compresses the messages via the proxy’s /v1/compress endpoint.
3

Convert back to Vercel format

Compressed messages are converted back to Vercel’s message format.
4

Pass to model

The original model receives the smaller prompt. All other model behavior — tool calling, structured output, streaming — is unchanged.
The middleware integrates at the transformParams level, which means it runs before every doGenerate and doStream call. This covers generateText, streamText, generateObject, streamObject, and any other Vercel AI SDK function.

Build docs developers (and LLMs) love