Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

The Headroom proxy exposes an HTTP API on http://127.0.0.1:8787 by default. Most clients interact with it through the provider-compatible endpoints (/v1/chat/completions, /v1/messages) by setting environment variables. Additional endpoints provide stats, health checks, and admin controls.

Provider endpoints

These endpoints are drop-in replacements for the upstream provider APIs. Set the base URL in your client to route through Headroom.

POST /v1/chat/completions

OpenAI-compatible endpoint. Accepts the same request body as POST https://api.openai.com/v1/chat/completions, compresses the messages, forwards to the upstream provider, and returns the response unchanged.
# Point any OpenAI-compatible client at the proxy
export OPENAI_BASE_URL=http://localhost:8787/v1
Headers:
  • Authorization: Bearer <your-api-key> — forwarded to the upstream provider
  • Content-Type: application/json

POST /v1/messages

Anthropic-compatible endpoint. Accepts the same request body as POST https://api.anthropic.com/v1/messages.
# Point Claude Code at the proxy
export ANTHROPIC_BASE_URL=http://localhost:8787
Headers:
  • x-api-key: <your-anthropic-key> — forwarded to Anthropic
  • anthropic-version: 2023-06-01
  • Content-Type: application/json

POST /v1/compress

Direct compression endpoint (loopback only). Compresses a messages array without forwarding to any provider. Returns the compressed messages and savings metadata.
curl -X POST http://localhost:8787/v1/compress \
  -H "Content-Type: application/json" \
  -d '{"messages": [...], "model": "gpt-4o"}'

POST /v1/retrieve

Direct CCR retrieval endpoint (loopback only). Retrieves a previously compressed and cached original by hash.
curl -X POST http://localhost:8787/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"hash": "abc123"}'

Stats and observability

GET /stats

Returns session statistics as JSON. Available to any client (not loopback-restricted).
curl http://localhost:8787/stats
Selected response fields (the full payload is deeply nested — these are the most commonly accessed paths):
requests.total
integer
Total number of requests processed since the proxy started.
requests.cached
integer
Requests served from the semantic cache.
requests.failed
integer
Requests that resulted in an error.
requests.by_provider
object
Request counts keyed by provider name (e.g. anthropic, openai).
tokens.input
integer
Total input tokens sent to the upstream provider (after compression).
tokens.saved
integer
Total tokens saved across all compression layers (proxy + CLI context tool).
tokens.proxy_compression_saved
integer
Tokens saved by proxy compression alone (excludes CLI context-tool savings).
tokens.savings_percent
float
All-layers savings as a percentage of original token count.
savings.total_tokens
integer
Combined tokens saved by all layers (proxy compression + CLI filtering).
display_session
object
Canonical persisted display-session metrics for the dashboard — includes human-readable summaries and per-project breakdowns.
summary
object
Human-readable session summary used by the savings dashboard.
Pass ?cached=true to return the last cached stats payload without recomputing (faster, slightly stale):
curl "http://localhost:8787/stats?cached=true"

GET /stats-history

Returns durable compression history plus display-session state. Supports JSON and CSV output.
curl "http://localhost:8787/stats-history"
curl "http://localhost:8787/stats-history?series=daily&format=csv"
format
json | csv
default:"json"
Response format. csv returns a downloadable file attachment; json returns the full history payload.
series
history | hourly | daily | weekly | monthly
default:"history"
Which time-series aggregation to return. history returns the raw per-request history; the others return pre-bucketed roll-ups.
history_mode
compact | full | none
default:"compact"
How much detail to include in each history entry. compact includes key savings fields; full includes all fields; none omits the history array (returns only the display-session summary).

GET /metrics

Returns Prometheus-compatible metrics.
curl http://localhost:8787/metrics

GET /transformations/feed

Returns the most recent transformation events (compressed requests) for the live dashboard feed. Loopback only.
curl http://localhost:8787/transformations/feed?limit=20

Health checks

GET /health

General health check. Returns 200 OK with a JSON body when the proxy is running and its upstream connection is healthy.
curl http://localhost:8787/health
# {"status": "ok", "version": "0.28.0", "upstream": "healthy"}

GET /livez

Kubernetes liveness probe. Returns 200 OK as long as the process is alive.

GET /readyz

Kubernetes readiness probe. Returns 200 OK once the proxy has warmed up and is ready to serve traffic.

Admin endpoints

Admin endpoints are restricted to loopback connections (127.0.0.1 or localhost). They cannot be called from remote hosts.

POST /admin/runtime-env

Hot-syncs environment variable overrides to the running proxy without a restart. Used by headroom wrap to propagate settings like HEADROOM_OUTPUT_SHAPER to an already-running proxy.
curl -X POST http://localhost:8787/admin/runtime-env \
  -H "Content-Type: application/json" \
  -d '{"HEADROOM_OUTPUT_SHAPER": "1"}'
On a shared proxy, these overrides are global — the last explicit setting wins.

POST /stats/reset

Resets all session statistics counters to zero. Loopback only.
curl -X POST http://localhost:8787/stats/reset

POST /cache/clear

Clears the semantic cache. Loopback only.
curl -X POST http://localhost:8787/cache/clear

GET /admin/upstream

Returns the currently configured upstream provider URL and backend. Loopback only.
curl http://localhost:8787/admin/upstream

CCR retrieval endpoints

GET /v1/retrieve/

Retrieves a cached original by hash key. Loopback only.
curl http://localhost:8787/v1/retrieve/abc123def456

GET /v1/retrieve/stats

Returns statistics about the CCR cache (size, hit rate, evictions). Loopback only.

POST /v1/retrieve/tool_call

Handles an headroom_retrieve tool call from an LLM response. Called internally by the proxy’s response handler; you do not need to call this directly.

Dashboard

GET /dashboard

Returns the HTML live savings dashboard. Open in a browser while the proxy is running.
open http://localhost:8787/dashboard
# or
headroom dashboard

Build docs developers (and LLMs) love