Headroom automatically detects what kind of content you’re sending and routes it to the right compressor — no configuration required. Every request flows through a two-stage pipeline: CacheAligner stabilizes prefixes so provider KV caches hit, and ContentRouter dispatches each message block to the optimal compressor based on its detected content type. The result is 40–95% fewer tokens with the same answers, and full reversibility via the CCR architecture.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
The Two-Stage Pipeline
Content Type Detection
The router detects content type by analyzing structure and patterns. Detection uses Magika (a Google ML model) when the[ml] extra is installed, falling back to fast heuristics when it isn’t. No manual hints are needed.
| Content Type | Detection Signal | Compressor | Typical Savings |
|---|---|---|---|
| JSON arrays | Valid JSON with array elements | SmartCrusher | 70–90% |
| Source code | Syntax patterns, indentation, keywords | CodeAwareCompressor | 40–70% |
| Build / test logs | Timestamps, log levels, pytest / npm markers | LogCompressor | 80–95% |
| Search results | file:line:content format | SearchCompressor | 60–80% |
| Plain text | Fallback for prose and unstructured output | Kompress-v2-base | 30–50% |
| Images | Binary content | ML image router | 40–90% |
Code compression via
CodeAwareCompressor is opt-in and disabled by default.
Enable it with the [code] extra: pip install "headroom-ai[code]".SmartCrusher — Statistical JSON Compression
SmartCrusher is the primary compressor for JSON arrays produced by tool calls. It performs field-level statistical analysis to identify which items carry unique information and which are redundant.How it works
Parse
The input is parsed as JSON. SmartCrusher handles arrays of dicts, arrays of strings, arrays of numbers, flat objects with many keys, and nested objects (compressed recursively).
Statistical analysis
For each field across all array items, SmartCrusher computes variance, uniqueness ratios, and change-point detection (using a fixed window of 5 items). Fields below the
uniqueness_threshold (default 0.1) are flagged as near-constant.Representative sampling
The Kneedle algorithm runs on cumulative bigram coverage to find the knee — the point where adding more items yields diminishing new information. This determines
K, the target item count. By default: 30% of K from the array start, 15% from the end, and 55% by importance score (errors, anomalies, relevance to the user query).Anomaly and error preservation
Items are always kept regardless of the K budget if they contain error indicators or are statistical anomalies (more than 2 standard deviations from the numeric mean). This ensures debugging signals survive compression.
Safety guarantees
SmartCrusher never produces output that changes the JSON schema — no wrappers, no generated text, no metadata fields. The output is always a valid subset of the original array.CodeAwareCompressor — AST-Aware Source Code
CodeAwareCompressor uses tree-sitter to parse source code into an AST before compressing it, guaranteeing syntactically valid output.
Supported languages
Tier 1 — Full support
Python, JavaScript, TypeScript
Tier 2 — Structural support
Go, Rust, Java, C, C++, Perl
Compression strategy
- Parse source into an AST with tree-sitter (thread-local parsers — no shared mutable state)
- Extract and unconditionally preserve: imports, function signatures, type annotations, class definitions, error handlers
- Rank functions by semantic importance
- Compress function bodies while keeping signatures intact
- Reassemble into syntactically valid code
Kompress-v2-base — ML Text Compression
For plain prose, READMEs, documentation, and other unstructured text, Headroom uses Kompress-v2-base — a ModernBERT model fine-tuned on agentic traces and published on HuggingFace.- Model:
chopratejas/kompress-v2-base - Runtime: ONNX INT8 inference (weight-only quantization, 261 MB). No GPU required.
- Must-keep tokens: numbers, ALLCAPS identifiers (
SIGILL,HTTP), dotted paths (libsystem_kernel.dylib), unix paths, file extensions, CLI flags, and CamelCase names are always preserved regardless of model score — these carry semantic meaning agents cannot reconstruct from context. - Target ratio: Configurable via
target_ratio(e.g.0.5keeps 50% of tokens). Default is aggressive (~15%kept).
LogCompressor — Shell, Build, and Test Logs
LogCompressor targets the verbose, repetitive output produced by build systems, test runners, and shell commands — the highest-savings category Headroom handles.
It detects logs via timestamps, log levels (INFO, WARN, ERROR), and pytest/npm/cargo markers. Compression clusters repeated log patterns and drops redundant lines while always preserving:
- Lines containing error keywords (
error,fatal,exception,traceback) - Stack traces
- Statistical anomalies in numeric fields (e.g. abnormally high durations)
- The first and last lines of each logical section
Image Compression
Images embedded in messages are routed to a trained ML router that selects the optimal compression strategy per image. Reduction ranges from 40–90% depending on content type (screenshots compress more than photographs). Requires the[image] extra.
How Transforms Chain Together
Each compressor is independent and fails gracefully — if a compressor errors, the original content is returned unchanged. Transforms are applied in message order and their results are logged inCompressResult.transforms_applied:
ContentRouter follow the format router:<message_role>:<detected_type>, making it straightforward to audit exactly what was compressed and why.