TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
compress() function is the simplest entry point into Headroom. Pass a list of messages, get a compressed list back — no proxy, no configuration, no client wrapping needed. It works identically whether you’re using the Anthropic SDK, OpenAI SDK, LiteLLM, or a raw HTTP client.
Function Signature
Parameters
List of messages in Anthropic or OpenAI format. Each message should have at minimum a
role key and a content key.Model name used for token counting and determining the context window limit. Pass the exact model string you will use when calling the LLM (e.g.
"gpt-4o", "claude-opus-4-20250514").Context window size in tokens for the target model. Headroom uses this to decide how aggressively to compress when the context is nearly full.
Set to
False to disable compression entirely and return the original messages unchanged. Useful for A/B testing or gradual rollout.Optional
CompressionHooks instance. Hooks let you inject pre-processing, per-message bias overrides, and post-compression observability callbacks.Full compression configuration object. Individual
**kwargs override fields inside this object; you can mix both.Shorthand for any
CompressConfig field — passed directly as keyword arguments. Valid keys: compress_user_messages, compress_system_messages, protect_recent, protect_analysis_context, target_ratio, min_tokens_to_compress, kompress_model, savings_profile.CompressConfig
CompressConfig controls what gets compressed, how aggressively, and with which model variant. Pass it as config= or use the shorthand **kwargs form.
Compress user messages as well as tool outputs. Default is
False because coding agents need to see exact user instructions. Set True for document pipelines, RAG, or when user messages contain large tool outputs.Compress system messages. Set
False to preserve system prompts exactly, for example in voice agents where tool definitions must not be altered.Number of trailing messages to leave uncompressed. These are the active conversation turns. Set
0 to compress the entire context.Detect
analyze / review intent in the conversation and protect code blocks from compression when found.Keep ratio for the text (Kompress) compressor.
None lets the model decide (~15% kept, aggressive). 0.5 keeps 50% (safe for documents). Only affects text compression — SmartCrusher (JSON) uses its own statistical logic.Minimum token count for a message to be eligible for compression. Messages shorter than this threshold are left untouched.
HuggingFace model ID for the Kompress text compressor.
None uses the default (chopratejas/kompress-v2-base). Set to "disabled" to skip ML text compression entirely — only SmartCrusher and CacheAligner will run.Named high-savings preset. For example,
"agent-90" applies settings tuned for Codex/Claude/Cursor coding agents targeting 90% token reduction.CompressResult
compress() always returns a CompressResult — it never raises on failure, instead reverting to the original messages and logging a warning.
The compressed messages in the same format as the input. Drop-in replacement for the original
messages list.Token count before compression.
0 if compression was skipped (empty input, optimize=False, or a failure that triggered the safety guard).Token count after compression.
Tokens removed:
tokens_before - tokens_after.Fraction of tokens saved.
0.0 means nothing was saved; 0.65 means 65% of tokens were removed.Internal names of every transform that ran, e.g.
["router:tool_result:json", "smart_crusher", "cache_aligner"]. Useful for debugging. When the inflation guard fires, the list contains ["inflation_guard:reverted"].If compression would inflate the token count (a rare edge case), Headroom automatically reverts to the original messages. The returned
CompressResult will have tokens_saved=0 and transforms_applied=["inflation_guard:reverted"].Usage Examples
Document and Financial Pipelines
For documents, RAG pipelines, or contexts where user messages contain large data payloads, enable user-message compression and adjusttarget_ratio:
Using Hooks for Observability
Pass aCompressionHooks subclass to instrument compression events:
compress_spreadsheet()
compress_spreadsheet() compresses .xlsx and .xls files by rendering each sheet as CSV text and running the full compression pipeline per sheet. Requires the spreadsheet extra: pip install headroom-ai[spreadsheet].
Parameters
Filesystem path to an
.xlsx or .xls file.Model name for token counting and context limit determination.
Model context window size in tokens.
Forwarded to
compress(). For example, pass target_ratio=0.3 to compress each sheet to 30% of its original size.Example
Each sheet becomes its own
user message. The tabular compressor (CSV → SmartCrusher) runs per sheet, applying lossless column/row compaction first, falling back to lossy row-drop with CCR markers when lossless savings are insufficient.Handling the Result
compress() is safe to call unconditionally — it never throws on compression failure: