Prometheus Setup

Prometheus is the time-series database that stores and queries metrics collected by gNMIc.

Overview

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability:

Pull-based metric collection (scraping)
Powerful query language (PromQL)
Time-series data storage
Built-in alerting capabilities
Service discovery support

Lab Configuration:

Container: prometheus
Management IP: 10.77.1.13
Web UI Port: 9090
Config File: configs/prometheus/prometheus.yml

Configuration File

The Prometheus configuration is minimal and focused:

global:
  scrape_interval: 5s

scrape_configs:
  - job_name: "gnmic"
    static_configs:
      - targets: ["gnmic:9273"]

Configuration Breakdown

Global Settings
Scrape Configs

global:
  scrape_interval: 5s

scrape_interval: How often Prometheus polls metric endpoints

Default: 5 seconds
Matches gNMIc’s sample-interval for consistent data
Lower values = more data points but higher storage/CPU usage
Higher values = less granular but more efficient

scrape_configs:
  - job_name: "gnmic"
    static_configs:
      - targets: ["gnmic:9273"]

job_name: Label applied to all metrics from this job targets: List of host:port to scrape

gnmic:9273 - gNMIc Prometheus exporter endpoint
Uses Docker DNS for name resolution
Port 9273 is gNMIc’s default Prometheus export port

Data Retention

Prometheus stores data with default retention settings:

Setting	Default	Description
Retention time	15 days	How long to keep data
Retention size	No limit	Maximum storage size
Storage path	`/prometheus`	Data directory

To customize retention, modify the container command in lab.yml:

prometheus:
  kind: linux
  image: prom/prometheus
  cmd: >
    --config.file=/etc/prometheus/prometheus.yml
    --storage.tsdb.retention.time=30d
    --storage.tsdb.retention.size=10GB

Access Prometheus Web UI

Prometheus includes a built-in web interface:

http://localhost:9090

Key Web UI Features

Graph
Targets
Service Discovery
Configuration

Execute PromQL queries and visualize results

Navigate to Graph tab
Enter a query in the expression box
Click Execute
View table or graph visualization

Example queries:

# All CPU metrics
system_cpu

# Interface statistics for a specific device
port_statistics_in_packets{device="bng1"}

# Rate of change over 5 minutes
rate(port_statistics_in_packets[5m])

Monitor scrape target health

Navigate to Status → Targets
Shows all configured scrape jobs
Displays:
- Target endpoint
- State (UP/DOWN)
- Labels
- Last scrape time and duration
- Errors (if any)

Expected status: gnmic (1/1 up)

View active configuration

Navigate to Status → Configuration
Displays current prometheus.yml content
Shows all runtime settings

PromQL Query Language

Prometheus Query Language (PromQL) is used to query and aggregate metrics.

Basic Queries

# All samples of a metric
system_cpu

# Filter by label
system_cpu{device="bng1"}

# Multiple label filters
port_statistics_in_packets{device="bng1",port_id="1/1/c1/1"}

Common Functions

rate() - Calculate per-second rate

# Packets per second
rate(port_statistics_in_packets[5m])

# Bits per second (assuming byte counter * 8)
rate(port_statistics_in_octets[5m]) * 8

Use for counter metrics that always increase.

irate() - Instant rate

# Instantaneous rate using last two samples
irate(port_statistics_in_packets[5m])

More responsive than rate() but can be volatile.

sum() - Aggregate values

# Total packets across all interfaces
sum(port_statistics_in_packets)

# Total per device
sum by (device) (port_statistics_in_packets)

# Total per device and port
sum by (device, port_id) (port_statistics_in_packets)

avg() - Average values

# Average CPU across all devices
avg(system_cpu)

# Average per device
avg by (device) (system_cpu)

increase() - Total increase

# Total packets in last 5 minutes
increase(port_statistics_in_packets[5m])

Like rate() but returns total change, not per-second.

Example Queries for BNG Lab

# CPU utilization by device
system_cpu{device=~"bng.*"}

# Memory usage percentage
(system_memory_pools_in_use / system_memory_pools_available) * 100

# Devices with high CPU
system_cpu > 80

Querying from Command Line

You can query Prometheus using its HTTP API:

# Query current value
curl -G http://localhost:9090/api/v1/query \
  --data-urlencode 'query=system_cpu{device="bng1"}'

Monitoring Prometheus

Check Scrape Health

# View Prometheus logs
sudo docker logs prometheus

# Check targets via API
curl http://localhost:9090/api/v1/targets | jq

# Expected output: "health": "up" for gnmic target

Query Statistics

# Prometheus metrics about itself
prometheus_tsdb_head_series          # Number of time series
prometheus_tsdb_head_samples_appended_total  # Samples added
rate(prometheus_tsdb_head_samples_appended_total[5m])  # Sample rate

# Storage metrics
prometheus_tsdb_storage_blocks_bytes  # Storage size
prometheus_tsdb_head_chunks          # Active chunks in memory

Verify Data Collection

# Test gNMIc endpoint from Prometheus container
sudo docker exec prometheus wget -O- http://gnmic:9273/metrics | head

# Count metrics available
sudo docker exec prometheus wget -O- http://gnmic:9273/metrics | grep -c "^[a-z]"

Performance Tuning

Adjust Scrape Interval

global:
  scrape_interval: 10s  # Less frequent scraping
  
scrape_configs:
  - job_name: "gnmic"
    scrape_interval: 5s   # Override for specific job
    static_configs:
      - targets: ["gnmic:9273"]

Scrape Timeout

global:
  scrape_timeout: 10s  # Max time to wait for scrape

scrape_configs:
  - job_name: "gnmic"
    scrape_timeout: 5s   # Job-specific timeout
    static_configs:
      - targets: ["gnmic:9273"]

Relabeling

scrape_configs:
  - job_name: "gnmic"
    static_configs:
      - targets: ["gnmic:9273"]
    
    # Add custom labels
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'telemetry-collector'
    
    # Drop high-cardinality metrics
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'expensive_metric_.*'
        action: drop

Troubleshooting

Target is DOWN

Problem: gNMIc target shows as DOWN in PrometheusCheck list:

Verify gNMIc is running:
```
sudo docker ps | grep gnmic
```

Test connectivity from Prometheus:

sudo docker exec prometheus wget -O- http://gnmic:9273/metrics

Check Prometheus logs:

sudo docker logs prometheus | grep gnmic

Verify configuration:

sudo docker exec prometheus cat /etc/prometheus/prometheus.yml

No data for queries

Problem: Queries return empty resultsSolutions:

Check if metrics exist:

curl http://localhost:9273/metrics | grep system_cpu

Verify scraping is working (Targets page should show UP)

Check query syntax and label filters:

# Wrong (no such label)
system_cpu{host="bng1"}

# Correct
system_cpu{device="bng1"}

Ensure time range includes data points

High memory usage

Problem: Prometheus container using excessive memorySolutions:

Reduce retention:

--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=5GB

Increase scrape interval in prometheus.yml:
```
global:
  scrape_interval: 15s
```
Drop unused metrics in gNMIc or Prometheus config
Monitor series cardinality:
```
prometheus_tsdb_head_series
```

Slow queries

Problem: PromQL queries taking too longSolutions:

Reduce query time range
Use recording rules for expensive queries (requires config reload)
Add more specific label filters
Use irate() instead of rate() for recent data
Limit results with topk() or bottomk()

Integration with Grafana

Prometheus is pre-configured as a Grafana datasource:

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    uid: prometheus
    isDefault: true

Access in Grafana:

Navigate to Configuration → Data Sources
Select Prometheus
Click Test to verify connection

Grafana uses the same PromQL query language. Queries in Prometheus Web UI can be copied directly to Grafana panels.

Advanced Configuration

Multiple Scrape Targets

To scrape additional exporters:

scrape_configs:
  - job_name: "gnmic"
    static_configs:
      - targets: ["gnmic:9273"]
  
  - job_name: "node_exporter"
    static_configs:
      - targets: ["node-exporter:9100"]
  
  - job_name: "custom_exporter"
    static_configs:
      - targets:
          - "exporter1:8080"
          - "exporter2:8080"

Service Discovery

For dynamic environments:

scrape_configs:
  - job_name: "docker"
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
    relabel_configs:
      - source_labels: [__meta_docker_container_name]
        target_label: container

Recording Rules

Pre-compute expensive queries:

groups:
  - name: bng_metrics
    interval: 30s
    rules:
      - record: interface_bps_in
        expr: rate(port_statistics_in_octets[5m]) * 8
      
      - record: interface_bps_out
        expr: rate(port_statistics_out_octets[5m]) * 8

Stack Overview

Components

Metrics

Prometheus Configuration

Prometheus Setup

Overview

Configuration File

Configuration Breakdown

Data Retention

Access Prometheus Web UI

Key Web UI Features

PromQL Query Language

Basic Queries

Common Functions

Example Queries for BNG Lab

Querying from Command Line

Monitoring Prometheus

Check Scrape Health

Query Statistics

Verify Data Collection

Performance Tuning

Adjust Scrape Interval

Scrape Timeout

Relabeling

Troubleshooting

Integration with Grafana

Advanced Configuration

Multiple Scrape Targets

Service Discovery

Recording Rules

Next Steps

Grafana Dashboards

Available Metrics

Build docs developers (and LLMs) love

Stack Overview

Components

Metrics

Documentation Index

​Prometheus Setup

​Overview

​Configuration File

​Configuration Breakdown

​Data Retention

​Access Prometheus Web UI

​Key Web UI Features

​PromQL Query Language

​Basic Queries

​Common Functions

​Example Queries for BNG Lab

​Querying from Command Line

​Monitoring Prometheus

​Check Scrape Health

​Query Statistics

​Verify Data Collection

​Performance Tuning

​Adjust Scrape Interval

​Scrape Timeout

​Relabeling

​Troubleshooting

​Integration with Grafana

​Advanced Configuration

​Multiple Scrape Targets

​Service Discovery

​Recording Rules

​Next Steps

Grafana Dashboards

Available Metrics

Build docs developers (and LLMs) love

Prometheus Setup

Overview

Configuration File

Configuration Breakdown

Data Retention

Access Prometheus Web UI

Key Web UI Features

PromQL Query Language

Basic Queries

Common Functions

Example Queries for BNG Lab

Querying from Command Line

Monitoring Prometheus

Check Scrape Health

Query Statistics

Verify Data Collection

Performance Tuning

Adjust Scrape Interval

Scrape Timeout

Relabeling

Troubleshooting

Integration with Grafana

Advanced Configuration

Multiple Scrape Targets

Service Discovery

Recording Rules

Next Steps