The Headroom proxy exposes an HTTP API onDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt
Use this file to discover all available pages before exploring further.
http://127.0.0.1:8787 by default. Most clients interact with it through the provider-compatible endpoints (/v1/chat/completions, /v1/messages) by setting environment variables. Additional endpoints provide stats, health checks, and admin controls.
Provider endpoints
These endpoints are drop-in replacements for the upstream provider APIs. Set the base URL in your client to route through Headroom.POST /v1/chat/completions
OpenAI-compatible endpoint. Accepts the same request body asPOST https://api.openai.com/v1/chat/completions, compresses the messages, forwards to the upstream provider, and returns the response unchanged.
Authorization: Bearer <your-api-key>— forwarded to the upstream providerContent-Type: application/json
POST /v1/messages
Anthropic-compatible endpoint. Accepts the same request body asPOST https://api.anthropic.com/v1/messages.
x-api-key: <your-anthropic-key>— forwarded to Anthropicanthropic-version: 2023-06-01Content-Type: application/json
POST /v1/compress
Direct compression endpoint (loopback only). Compresses a messages array without forwarding to any provider. Returns the compressed messages and savings metadata.POST /v1/retrieve
Direct CCR retrieval endpoint (loopback only). Retrieves a previously compressed and cached original by hash.Stats and observability
GET /stats
Returns session statistics as JSON. Available to any client (not loopback-restricted).Total number of requests processed since the proxy started.
Requests served from the semantic cache.
Requests that resulted in an error.
Request counts keyed by provider name (e.g.
anthropic, openai).Total input tokens sent to the upstream provider (after compression).
Total tokens saved across all compression layers (proxy + CLI context tool).
Tokens saved by proxy compression alone (excludes CLI context-tool savings).
All-layers savings as a percentage of original token count.
Combined tokens saved by all layers (proxy compression + CLI filtering).
Canonical persisted display-session metrics for the dashboard — includes human-readable summaries and per-project breakdowns.
Human-readable session summary used by the savings dashboard.
?cached=true to return the last cached stats payload without recomputing (faster, slightly stale):
GET /stats-history
Returns durable compression history plus display-session state. Supports JSON and CSV output.Response format.
csv returns a downloadable file attachment; json returns the full history payload.Which time-series aggregation to return.
history returns the raw per-request history; the others return pre-bucketed roll-ups.How much detail to include in each history entry.
compact includes key savings fields; full includes all fields; none omits the history array (returns only the display-session summary).GET /metrics
Returns Prometheus-compatible metrics.GET /transformations/feed
Returns the most recent transformation events (compressed requests) for the live dashboard feed. Loopback only.Health checks
GET /health
General health check. Returns200 OK with a JSON body when the proxy is running and its upstream connection is healthy.
GET /livez
Kubernetes liveness probe. Returns200 OK as long as the process is alive.
GET /readyz
Kubernetes readiness probe. Returns200 OK once the proxy has warmed up and is ready to serve traffic.
Admin endpoints
Admin endpoints are restricted to loopback connections (127.0.0.1 or localhost). They cannot be called from remote hosts.
POST /admin/runtime-env
Hot-syncs environment variable overrides to the running proxy without a restart. Used byheadroom wrap to propagate settings like HEADROOM_OUTPUT_SHAPER to an already-running proxy.
On a shared proxy, these overrides are global — the last explicit setting wins.
POST /stats/reset
Resets all session statistics counters to zero. Loopback only.POST /cache/clear
Clears the semantic cache. Loopback only.GET /admin/upstream
Returns the currently configured upstream provider URL and backend. Loopback only.CCR retrieval endpoints
GET /v1/retrieve/
Retrieves a cached original by hash key. Loopback only.GET /v1/retrieve/stats
Returns statistics about the CCR cache (size, hit rate, evictions). Loopback only.POST /v1/retrieve/tool_call
Handles anheadroom_retrieve tool call from an LLM response. Called internally by the proxy’s response handler; you do not need to call this directly.