The full observability page (metrics, violations, alerts, drift detection) requires a Pro plan or above. The dashboard overview is available on all plans.
Architecture
Every agent run produces signals that flow from the SDK through the backend and into the dashboard in real time.- Your agents call the Drako SDK for trust evaluation, audit logging, and policy checks.
- The backend processes each request, tracks metrics in Postgres/Redis, and emits Prometheus counters.
- Grafana Alloy scrapes
/metricsevery 30 seconds and pushes to Grafana Cloud. - The dashboard fetches aggregated data via REST API and receives live updates via WebSocket.
- All data is tenant-isolated via PostgreSQL Row-Level Security.
Dashboard overview
The command center at/dashboard gives you a real-time snapshot of your governance posture.
Metric cards
| Metric | Description |
|---|---|
| Audit entries | Total audit log entries in the current period |
| Agents verified | Number of agents with completed trust evaluation |
| Policy blocks | Actions blocked by governance policies |
| Avg trust score | Fleet-wide average trust score (0.0–1.0) |
Quota usage
A horizontal progress bar shows your current plan usage. The bar turns yellow at 70% and red above 90%.Governance score trend
A 30-day time-series chart of your governance score progression, sourced fromGET /dashboard/score-progression.
Tool health grid
A visual grid of your tools’ circuit breaker states:- Green (CLOSED) — Tool is healthy and operating normally
- Yellow (HALF_OPEN) — Tool is recovering; limited traffic allowed
- Red (OPEN) — Tool is circuit-broken; requests are being rejected
Activity feed
Real-time stream of the latest audit log entries with auto-refresh every 30 seconds, connected to the WebSocket for instant updates.Key metrics
Health grade
A–F composite score combining latency, error rate, and governance overhead. Sources:
GET /observability/insights/health.Latency
P50, P95, and P99 percentiles with time-series visualization. Sources:
GET /observability/metrics. Updated every 30 seconds.Violation heatmap
A 7×24 grid (days × hours) where cell intensity represents violation count. Reveals patterns like batch-job spikes at 2 AM.
Drift detection
Automatic identification of behavioral drift across your fleet. When an agent’s behavior deviates significantly from its historical pattern, drift is flagged. Sources:
GET /observability/insights/drift. Updated every 5 minutes.Observability page
The full observability page at/observability is organized into four tabs: Overview, Metrics, Violations, and Alerts.
- Overview
- Metrics
- Violations
- Alerts
Unified health assessment combining multiple signals:
| Component | What it measures |
|---|---|
| Health grade | A–F composite grade: latency + error rate + governance overhead |
| P50 latency | Median request latency across all endpoints |
| P95 latency | 95th percentile latency (tail performance) |
| Active alerts | Number of currently firing alert rules |
| Drift status | Whether behavioral drift has been detected in the fleet |
Configuring alert rules
Define alert rules in.drako.yaml. Each rule specifies a metric, a threshold condition, and one or more notification channels.
Session traces
Every agent session produces a full span tree — tool calls, policy checks, latency breakdowns, and audit references. Session traces are accessible from the Agents page and link directly to the corresponding audit log entries.Exporting to external systems
OTEL export (Datadog, Grafana, New Relic)
OTEL export (Datadog, Grafana, New Relic)
Drako supports OpenTelemetry export. Pipe traces and metrics to your existing observability stack:
- Datadog — traces via OTLP exporter
- Grafana — metrics via Grafana Alloy already scraped from
/metrics - New Relic — traces via OTLP exporter
SIEM export (Splunk, ELK)
SIEM export (Splunk, ELK)
Security events are exportable to SIEM platforms via STIX 2.1 or CEF format:
- Splunk — ingest via HTTP Event Collector using CEF-formatted events
- ELK (Elasticsearch) — ingest via Logstash pipeline using STIX 2.1 bundles
Plan availability
| Feature | Free | Starter | Pro | Enterprise |
|---|---|---|---|---|
| Dashboard overview | Yes | Yes | Yes | Yes |
| Audit trail | 7 days | 30 days | 90 days | Custom |
| Agent trust scores | Yes | Yes | Yes | Yes |
| Governance score trend | — | Yes | Yes | Yes |
| Tool health grid | — | — | Yes | Yes |
| Observability (full) | — | — | Yes | Yes |
| Alert rules | — | — | Yes | Yes |
| Violation heatmap | — | — | Yes | Yes |
| Drift detection | — | — | Yes | Yes |
| OTEL export | — | — | Yes | Yes |
| Custom metrics | — | — | — | Yes |
Real-time updates
The dashboard connects to a WebSocket atwss://api.getdrako.com/ws for live updates. The connection indicator in the header shows:
- Green dot (pulsing) — Connected, receiving live data
- Yellow dot — Reconnecting
- No dot — Disconnected (data still refreshes every 30 seconds via polling)