Everything in Headroom’s core pipeline shrinks the prompt you send. But you also pay for every token the model writes back — and on Opus-class models, output costs 5× input. A lot of that output is waste: “Great, let me…” preambles, re-printing code you just showed it, and deep “thinking” on routine steps like reading a file. Headroom can trim that too, from the proxy, without you changing any code.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
Two Mechanisms
Verbosity Steering
Appends a short “be terse, don’t restate context” note to the end of the system prompt — after the existing prompt, so your provider’s prefix cache still hits.
Effort Routing
When a turn is just the model resuming after a tool result (a file read, a passing test), it dials the model’s thinking effort down. New questions and errors keep full effort.
Enable
HEADROOM_OUTPUT_SHAPER=1 before starting (or wrapping) enables them together.
Learn the Right Terseness Level Automatically
People don’t say how terse they want answers — they show it: they interrupt long replies or move on before they could have read them.headroom learn --verbosity mines those behavioral signals from your past sessions and picks the level automatically.
--apply, Headroom hot-enables the output shaper on the running proxy — no restart needed.
See Your Savings Estimate
Output savings are counterfactual — Headroom never sees what the model would have written — so it reports an honest estimate with a confidence range, never a made-up number:Get a Measured Number Instead
Leave 10% of conversations unshaped as a control group:measured rather than estimated, with a tighter confidence band derived from the actual control group.
Hot-Sync to a Running Proxy
All output-shaper env vars are read live on every request. If you need to change settings without restarting the proxy, send them directly via the admin endpoint:headroom wrap calls this endpoint automatically when it reuses an already-running proxy, so your settings always take effect immediately.
On a shared proxy, runtime overrides are global — the last explicit setting wins. Be intentional when multiple developers share a single proxy instance.
End-to-End Example
Configuration Reference
| Variable | Default | Purpose |
|---|---|---|
HEADROOM_OUTPUT_SHAPER | 0 | Master switch — enables verbosity steering and effort routing |
HEADROOM_OUTPUT_HOLDOUT | 0 | Fraction of conversations left unshaped for a measured control group (e.g. 0.1 = 10%) |
HEADROOM_VERBOSITY_LEVEL | (from verbosity.json) | Override the learned level directly (1–4) |