Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Excurs1ons/MonoRelay/llms.txt

Use this file to discover all available pages before exploring further.

MonoRelay tracks usage statistics in two layers: a fast in-memory counter that persists to data/stats.json on every update, and a SQLite-backed log database that enables richer historical queries. The stats API surfaces both layers in a single response and also exposes per-client usage tracking from the dedicated usage tracker.

Authentication

All stats endpoints require a valid JWT token:
Authorization: Bearer <jwt>

Endpoints

GET /api/stats

The primary stats endpoint. Returns a composite object combining in-memory global counters, persistent database aggregates, per-key statistics, and per-model detail breakdowns.
curl http://localhost:8787/api/stats \
  -H "Authorization: Bearer <jwt>"
data.in_memory
object
In-memory stats accumulated since the last restart or reset:
  • total_requests — total number of proxied requests
  • total_errors — number of requests that returned an error
  • error_ratetotal_errors / total_requests
  • total_tokens_in — cumulative input tokens
  • total_tokens_out — cumulative output tokens
  • total_cache_hit_tokens — tokens served from upstream prompt cache
  • total_tokens — combined input and output tokens
  • estimated_total_cost — estimated USD spend
  • requests_by_provider{"openrouter": 142, "anthropic": 58}
  • requests_by_model{"claude-opus-4-5": 34, ...}
  • errors_by_provider — error counts by provider name
data.persistent
object
Aggregates computed from the SQLite request log:
  • total_requests — all logged requests (survives restarts)
  • total_cost — total estimated cost from log records
  • avg_latency_ms — average end-to-end latency
  • input_tokens — total input tokens across all log entries
  • output_tokens — total output tokens
  • cache_hit_tokens — total cache-hit tokens from log entries
data.keys
object
Per-provider, per-key health data from the key manager: request counts, failure counts, cooldown state, and quota usage.
data.models
object
Per-model detail from the stats tracker. Each key is a model name:
  • requests — total requests using that model
  • errors — error count for that model
  • total_tokens_in / total_tokens_out — token totals
  • total_cache_hit_tokens — cache-hit tokens for that model
  • avg_first_token_ms — exponential-decay weighted average TTFT in ms
  • avg_speed_tps — exponential-decay weighted average output tokens per second
  • streaming_requests — number of streaming requests
Example response:
{
  "success": true,
  "data": {
    "in_memory": {
      "total_requests": 200,
      "total_errors": 5,
      "error_rate": 0.025,
      "total_tokens_in": 84200,
      "total_tokens_out": 31500,
      "total_cache_hit_tokens": 12300,
      "total_tokens": 115700,
      "estimated_total_cost": 0.0423,
      "requests_by_provider": {"anthropic": 120, "openrouter": 80},
      "requests_by_model": {"claude-opus-4-5": 90, "gpt-4o-mini": 80},
      "errors_by_provider": {"openrouter": 5}
    },
    "persistent": {
      "total_requests": 200,
      "total_cost": 0.042,
      "avg_latency_ms": 1284.3,
      "input_tokens": 84200,
      "output_tokens": 31500,
      "cache_hit_tokens": 12300
    },
    "models": {
      "claude-opus-4-5": {
        "requests": 90,
        "errors": 0,
        "total_tokens_in": 51000,
        "total_tokens_out": 18200,
        "total_cache_hit_tokens": 12300,
        "avg_first_token_ms": 621.4,
        "avg_speed_tps": 38.7,
        "streaming_requests": 74
      }
    }
  }
}

GET /api/stats/enhanced

Returns an expanded stats object that includes per-provider breakdowns enriched with provider configuration (cost rates, key counts) and a flat key health inventory useful for dashboards.
data.provider_breakdown
object
One entry per enabled provider:
  • enabled — provider enabled state
  • total_requests — requests routed to this provider
  • total_errors — errors from this provider
  • keys.total / keys.enabled — key inventory counts
  • cost_per_m_input / cost_per_m_output — configured pricing rates
data.key_health
array
Flat list of all keys across all providers, each entry containing provider, label, enabled, total_requests, total_failures, is_available, cooldown_until, quota_limit, quota_used, rate_limit_rps, and expires_at.

GET /api/stats/file

Return the raw stats.json file content as a string. Useful for backup or cross-instance synchronization.
{"content": "{ \"total_requests\": 200, ... }"}

PUT /api/stats/file

Replace the stats.json file content and immediately reload the in-memory stats from the new content.
content
string
required
JSON string to write as the new stats file content.

POST /api/stats/reset

Clear all in-memory and persisted statistics. Also deletes the stats.json file and clears the SQLite request log.
curl -X POST http://localhost:8787/api/stats/reset \
  -H "Authorization: Bearer <jwt>"
{"success": true, "message": "统计数据已清空"}
This operation is irreversible. Both the stats file and the full request log database will be cleared. Export or back up data before calling this endpoint.

Per-client usage stats

GET /api/usage/stats

Return usage statistics broken down by client identity (the username or access-key prefix that made each request). Pass client_id as a query parameter to retrieve stats for a single client.
# All clients
curl http://localhost:8787/api/usage/stats \
  -H "Authorization: Bearer <jwt>"

# Single client
curl "http://localhost:8787/api/usage/stats?client_id=alice" \
  -H "Authorization: Bearer <jwt>"

POST /api/usage/clear

Reset per-client usage counters. Pass client_id to clear only that client’s data, or omit it to clear all clients.

Analytics endpoints

GET /api/analytics/overview

Aggregate cost and usage by provider and model over a date range. Defaults to the last 7 days.
Query paramDescription
start_dateStart date in YYYY-MM-DD format
end_dateEnd date in YYYY-MM-DD format
data.total_requests
integer
Total requests in the date range.
data.total_cost
number
Estimated total cost in USD.
data.total_tokens
object
{"input": 84200, "output": 31500} — token totals for the period.
data.by_provider
object
Per-provider request counts and costs.
data.by_model
object
Per-model request counts, costs, and token totals.

GET /api/analytics/slow-queries

List requests that exceeded a latency threshold, ordered by first-token latency descending.
Query paramDefaultDescription
threshold_ms2000Minimum first-token latency in ms
start_date7 days agoStart of date range
end_datetodayEnd of date range
limit100Maximum results

GET /api/analytics/cost-distribution

Return cost broken down by provider and by model as percentage shares, useful for pie-chart visualizations in dashboards.

Dashboard integration

The MonoRelay dashboard reads all of these endpoints automatically. The stats overview panel uses GET /api/stats, the model breakdown table uses data.models, and the real-time activity feed uses GET /api/logs/stream. You can build your own monitoring dashboards using the same API surface that the built-in UI relies on.

Build docs developers (and LLMs) love