Accessing the dashboard
Start the stack
Run Docker Compose from the repository root. This starts the gateway, Redis, Qdrant, Prometheus, and Grafana together.
Open Grafana
Navigate to http://localhost:3000 in your browser.
The default admin password is set via
GF_SECURITY_ADMIN_PASSWORD=admin in docker-compose.yml. Change this value before deploying to any shared or production environment.How provisioning works
Grafana is configured entirely through volume mounts defined indocker-compose.yml. No manual setup steps are needed.
docker-compose.yml (grafana service)
grafana/provisioning/datasources/— Configures the Prometheus datasource pointing athttp://prometheus:9090, set as the default datasource.grafana/provisioning/dashboards/— Tells Grafana to load dashboard JSON files from/var/lib/grafana/dashboards, which maps tografana/dashboards/in the repository.
grafana/dashboards/draft-thinker.json. To add or modify panels, edit that file directly and restart the Grafana container, or use the Grafana UI (the dashboard is marked editable: true).
Prometheus scrape configuration
Prometheus is configured to scrape the gateway’s/metrics endpoint every 15 seconds.
prometheus.yml
gateway:8080 resolves to the gateway container over the Docker Compose internal network.
Dashboard panels
The dashboard is organized into four row sections. Each row groups related panels.Overview
Four summary stats shown at the top of the dashboard:
- Request rate —
sum(rate(draftthinker_requests_total[5m]))in requests per second - Draft acceptance rate — fraction of routing decisions where the drafter’s response was accepted, shown as a gauge with green/yellow/red thresholds
- Cost reduction — calibrated value (91.6%) at threshold T=2.0 on 518 prompts
- Cache hit rate — fraction of requests served from semantic cache, shown as a gauge
Latency
Three time-series panels showing P50, P95, and P99 percentiles:
- Upstream latency by provider —
draftthinker_upstream_latency_secondsbroken out bydrafterandheavyweight - Cache lookup latency —
draftthinker_cache_lookup_latency_secondsend-to-end including embedding and vector search - Speculative latency saved —
draftthinker_speculative_latency_saved_secondsP50 and P95
Routing
Two time-series panels showing decision flow and errors:
- Routing decisions over time —
draftthinker_routing_decisions_totalstacked byaccept,escalate, andcache_hit - Error rate by type —
draftthinker_errors_totalbroken out bytypelabel (invalid_request,routing_error,upstream_error,upstream_timeout,stream_error,internal_error)
Entropy and speculative
Three panels covering the routing engine internals:
- Entropy distribution —
draftthinker_entropy_distributionrendered as a heatmap over time, showing the per-token Shannon entropy spread in bits - Speculative trigger rate —
rate(draftthinker_speculative_triggers_total[5m])in triggers per second - Speculative cancellation ratio —
draftthinker_speculative_cancellations_total / draftthinker_speculative_triggers_totalas a gauge, green below 10%, yellow to 30%, red above