FANGS Prometheus Metrics Reference and Alert Runbook

The FANGS orchestrator exposes a Prometheus-compatible /metrics endpoint on the same address it listens on (default http://127.0.0.1:8443/metrics). Metrics are enabled by default and cover the full operational picture: sensor event throughput, ring-buffer health, deviations by severity, baseline promotions, webhook delivery status, and runner availability. Go runtime and process collectors (goroutines, GC, memory, CPU) are also registered automatically.

Endpoint and Enabling

# Default — metrics enabled on the orchestrator's listen address
GET http://127.0.0.1:8443/metrics

# Disable metrics (not recommended for production)
fangs-orchestrator -metrics=false

The metrics endpoint is bound to the same address as the API and UI (default 127.0.0.1:8443). It is localhost-only by default. If you expose the orchestrator behind a reverse proxy, make sure /metrics is either blocked from external access or protected by your proxy’s authentication layer.

Metrics Reference

Gauges

These reflect the current state at scrape time.

`fangs_orchestrator_info`

fangs_orchestrator_info{version="<build_version>"} 1

Always 1. Carries the build version as a label — useful for tracking rollouts and correlating behavioral changes with version upgrades. The version label is the only label on this series.

`fangs_runners_registered`

fangs_runners_registered 2

Count of currently-registered runners that have sent a recent heartbeat. Runners that have been evicted by the pruner are not counted. A drop to 0 means no runner is available to execute sandbox scans.

Counters

All counters reset to 0 on orchestrator restart. Use rate() in PromQL for meaningful rates.

`fangs_events_received_total`

fangs_events_received_total{type="file_access"} 184201
fangs_events_received_total{type="exec"} 3402
fangs_events_received_total{type="net_connect"} 9812
fangs_events_received_total{type="dns_query"} 4771
fangs_events_received_total{type="tls_sni"} 4309

Sensor events received by the orchestrator, partitioned by eBPF probe type. Use the rate of this counter alongside fangs_events_dropped_total to understand pipeline throughput.

`fangs_events_dropped_total`

fangs_events_dropped_total 0

Events dropped because the in-kernel ring-buffer was full before the consumer could read them. This is reported by the runner in the ScanResult and accumulated here. A sustained non-zero rate indicates the ring buffer is too small or the orchestrator’s event consumer is too slow.

`fangs_scans_queued_total`

fangs_scans_queued_total 412

Number of sandbox scans dispatched to a runner. Includes both watcher-triggered scans (new release detected) and manual fangs scan submit invocations.

`fangs_deviations_written_total`

fangs_deviations_written_total{severity="crit"} 3
fangs_deviations_written_total{severity="warn"} 27
fangs_deviations_written_total{severity="info"} 204

Deviations emitted by the differ after comparing a run’s fingerprints against the baseline. Partitioned by severity. This counter increments at analysis time — before any human triage or suppression.

`fangs_baseline_promoted_total`

fangs_baseline_promoted_total{trigger="auto"}   388
fangs_baseline_promoted_total{trigger="manual"}  24

Runs merged into the baseline, partitioned by how they were promoted:

Label	Meaning
`auto`	Run produced zero deviations — auto-promoted by the differ (D38)
`manual`	Operator ran `fangs baseline promote <run-id>`

`fangs_notifications_total`

fangs_notifications_total{notifier="soc-slack",status="sent"}      38
fangs_notifications_total{notifier="soc-slack",status="failed"}      2
fangs_notifications_total{notifier="siem",status="sent"}            35
fangs_notifications_total{notifier="siem",status="permanent"}        1

Webhook delivery attempts, partitioned by notifier name and final delivery status. Status values:

Value	Meaning
`sent`	HTTP 2xx received — delivery confirmed
`failed`	Transient failure (5xx, network error, 408/429) — retried
`permanent`	4xx (not 408/429) — request will not succeed on retry

Go Runtime and Process Metrics

Standard Go runtime collectors are registered automatically. These include:

Series prefix	What it covers
`go_goroutines`	Current goroutine count
`go_gc_*`	GC pause duration and frequency
`go_memstats_*`	Heap, stack, and GC memory statistics
`process_cpu_seconds_total`	Cumulative CPU usage
`process_resident_memory_bytes`	RSS
`process_open_fds`	Open file descriptors

Prometheus Scrape Configuration

scrape_configs:
  - job_name: fangs
    static_configs:
      - targets: ['127.0.0.1:8443']

If the orchestrator is running with TLS enabled, set the appropriate scheme and tls_config:

scrape_configs:
  - job_name: fangs
    scheme: https
    tls_config:
      ca_file: /etc/fangs/ca.crt
      cert_file: /etc/fangs/scraper.crt
      key_file: /etc/fangs/scraper.key
    static_configs:
      - targets: ['fangs.internal:8443']

Recommended Alerts

The following PromQL alert rules cover the most operationally significant failure modes:

- alert: FANGSRingbufOverflow
  expr: rate(fangs_events_dropped_total[5m]) > 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "FANGS ring buffer is dropping events"
    description: >
      fangs_events_dropped_total is increasing. The eBPF ring buffer
      cannot keep up with event volume. Increase the ring buffer size
      or reduce scanner concurrency.

Example Grafana Queries

# Scan throughput (scans per minute)
rate(fangs_scans_queued_total[5m]) * 60

# Deviation severity breakdown (last hour)
increase(fangs_deviations_written_total[1h])

# Notification delivery success rate per notifier
rate(fangs_notifications_total{status="sent"}[5m])
  /
rate(fangs_notifications_total[5m])

# Event drop ratio (should be 0)
rate(fangs_events_dropped_total[5m])
  /
rate(fangs_events_received_total[5m])

Get Started

Architecture

Configuration

Operations

Security

FANGS Prometheus Metrics Reference and Alert Runbook

Endpoint and Enabling

Metrics Reference

Gauges

`fangs_orchestrator_info`

`fangs_runners_registered`

Counters

`fangs_events_received_total`

`fangs_events_dropped_total`

`fangs_scans_queued_total`

`fangs_deviations_written_total`

`fangs_baseline_promoted_total`

`fangs_notifications_total`

Go Runtime and Process Metrics

Prometheus Scrape Configuration

Recommended Alerts

Example Grafana Queries

Build docs developers (and LLMs) love

Get Started

Architecture

Configuration

Operations

Security

Documentation Index

​Endpoint and Enabling

​Metrics Reference

​Gauges

​fangs_orchestrator_info

​fangs_runners_registered

​Counters

​fangs_events_received_total

​fangs_events_dropped_total

​fangs_scans_queued_total

​fangs_deviations_written_total

​fangs_baseline_promoted_total

​fangs_notifications_total

​Go Runtime and Process Metrics

​Prometheus Scrape Configuration

​Recommended Alerts

​Example Grafana Queries

Build docs developers (and LLMs) love

Endpoint and Enabling

Metrics Reference

Gauges

`fangs_orchestrator_info`

`fangs_runners_registered`

Counters

`fangs_events_received_total`

`fangs_events_dropped_total`

`fangs_scans_queued_total`

`fangs_deviations_written_total`

`fangs_baseline_promoted_total`

`fangs_notifications_total`

Go Runtime and Process Metrics

Prometheus Scrape Configuration

Recommended Alerts

Example Grafana Queries