Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/abelperezr/nokia-bng-lab/llms.txt

Use this file to discover all available pages before exploring further.

The Nokia BNG lab includes a comprehensive telemetry stack based on gNMI, Prometheus, and Grafana for real-time monitoring and observability.

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                   Telemetry Stack Flow                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Nokia Devices (BNG, Switch, OLT, TX)                      │
│          │                                                  │
│          │ gRPC/gNMI (port 57400)                         │
│          ▼                                                  │
│     gNMIc Collector (10.77.1.12:9273)                      │
│          │                                                  │
│          │ Prometheus Exposition                           │
│          ▼                                                  │
│   Prometheus TSDB (10.77.1.13:9090)                        │
│          │                                                  │
│          │ PromQL Queries                                  │
│          ▼                                                  │
│   Grafana Dashboards (10.77.1.14:3000)                     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Grafana Dashboard Access

1

Access Grafana UI

Open your browser and navigate to:
http://localhost:3030
Credentials:
  • Username: admin
  • Password: admin
2

Verify Data Source

Navigate to Configuration → Data Sources to verify Prometheus connection:
  • Name: Prometheus
  • Type: prometheus
  • URL: http://prometheus:9090
  • UID: prometheus
  • Status: Should show green checkmark
3

Access Pre-configured Dashboards

Dashboards are auto-provisioned from /var/lib/grafana/dashboards:
  • Nokia SROS System Metrics
  • Interface Statistics
  • BNG Subscriber Sessions
  • Network Instance Status
Grafana is configured with anonymous access enabled (Editor role), allowing dashboard viewing without authentication.

Prometheus Metrics

Access Prometheus UI

http://localhost:9090

Prometheus Configuration

The Prometheus server scrapes metrics from gNMIc:
# From configs/prometheus/prometheus.yml
global:
  scrape_interval: 5s

scrape_configs:
  - job_name: "gnmic"
    static_configs:
      - targets: ["gnmic:9273"]
All metrics are scraped at 5-second intervals. Adjust this if you experience performance issues or need different granularity.

Verify Metric Collection

1

Check gNMIc Metrics Endpoint

curl http://localhost:9273/metrics
You should see Prometheus-formatted metrics from all Nokia devices.
2

Query Metrics in Prometheus

Navigate to http://localhost:9090/graph and try:
# CPU usage across all devices
system_cpu_total

# Interface statistics
port_statistics_in_octets

# Operational state
port_oper_state
3

Check Target Health

Visit http://localhost:9090/targets and verify:
  • Target: gnmic:9273
  • State: UP
  • Last Scrape: < 5s ago

gNMIc Telemetry Collector

Configuration Overview

The gNMIc collector automatically discovers and subscribes to Nokia devices using Docker labels.
# From configs/gnmic/config.yml
loader:
  type: docker
  address: unix:///run/docker.sock
  filters:
    # SR Linux nodes
    - containers:
        - label: clab-node-kind=nokia_srlinux
      network:
        label: containerlab
      port: "57400"
      config:
        username: admin
        password: lab123
        skip-verify: true
        encoding: proto
    
    # SR OS nodes (BNG, Switch, OLT)
    - containers:
        - label: clab-node-kind=nokia_srsim
      network:
        label: containerlab
      port: "57400"
      config:
        username: admin
        password: lab123
        insecure: true
        encoding: json

Active Subscriptions

Sample Interval: 5 seconds
  • srl_platform: CPU and memory usage
  • srl_apps: Application management
  • srl_if_stats: Interface statistics and operational state
  • srl_if_lag_stats: LAG member statistics
  • srl_net_instance: Network instance state and route tables
  • srl_bgp_stats: BGP protocol statistics
  • srl_event_handler_stats: Event handler metrics
Sample Interval: 5 seconds (10s for VPLS SAPs)
  • sros_ports_stats: Port operational state and statistics
  • sros_router_bgp: BGP statistics and routes per family
  • sros_router_interface: IPv4/IPv6 interface statistics
  • sros_router_isis: IS-IS protocol statistics
  • sros_router_route_table: Route table statistics
  • sros_system: CPU and memory pool usage
  • sros_service_stats: VPLS/VPRN service operational state
  • sros_ludb: Local user database (subscriber info)
  • sros_vpls_sap_all: VPLS SAP statistics
  • sros_temperature_stats: Hardware temperature sensors
  • sros_fan_stats: Chassis fan speeds

View gNMIc Logs

# Real-time log streaming
docker logs -f clab-lab-gnmic

# Last 100 lines
docker logs --tail 100 clab-lab-gnmic

# Logs with timestamps
docker logs -t clab-lab-gnmic
The gNMIc collector logs all subscription activities, connection status, and metric processing. Use these logs to debug telemetry issues.

Key Metrics to Monitor

System Health Metrics

# SR OS CPU utilization (1-second sample)
system_cpu_total{source="bng1"}

# SR Linux CPU usage
platform_control_cpu_total{source="tx"}
Alert if CPU usage exceeds 80% for more than 5 minutes.
# SR OS memory pools
system_memory_pools_summary_total

# SR Linux memory
platform_control_memory_physical
platform_control_memory_utilized
# Port operational state (1=up, 0=down)
port_oper_state

# SR Linux interface state
interface_oper_state
Critical interfaces should be monitored with alerts for state changes.

BNG-Specific Metrics

# Local user database entries
subscriber_mgmt_local_user_db_ipoe_host_session_count
subscriber_mgmt_local_user_db_ppp_session_count

# Session statistics by type
rate(subscriber_mgmt_local_user_db_ipoe_sessions_created[5m])
# Service operational state
service_vpls_oper_state{service_name="subscriber-vlan-150"}

# SAP statistics
service_vpls_sap_stats_ingress_octets
service_vpls_sap_stats_egress_octets

Network Performance Metrics

# Ingress traffic rate (bytes/sec)
rate(port_statistics_in_octets[1m])

# Egress traffic rate
rate(port_statistics_out_octets[1m])

# Packet rates
rate(port_statistics_in_packets[1m])
rate(port_statistics_out_packets[1m])
# Input errors
rate(port_statistics_in_errors[5m])

# Output errors
rate(port_statistics_out_errors[5m])

# Discards
rate(port_statistics_in_discards[5m])
Any non-zero error rate should be investigated immediately.
# BGP established sessions
router_bgp_statistics_established_sessions

# Routes per address family
router_bgp_statistics_routes_per_family_active_routes{family="ipv4"}

RADIUS Accounting Logs

Access RADIUS Logs

1

View Authentication Logs

docker exec clab-lab-radius tail -f /var/log/radius/radius.log
2

View Accounting Logs

docker exec clab-lab-radius tail -f /var/log/radius/radacct/*/*
3

Search for Specific User

docker exec clab-lab-radius grep "test@test.com" /var/log/radius/radius.log
  • Main Log: /var/log/radius/radius.log
  • Accounting: /var/log/radius/radacct/
  • Configuration: /etc/raddb/
# View active RADIUS sessions
docker exec clab-lab-radius radclient localhost status testing123

# Debug mode (verbose logging)
docker exec clab-lab-radius radiusd -X

Device Health Monitoring

Temperature Monitoring

# Card temperature sensors
card_hardware_data_temperature_current

# MDA temperature
card_mda_hardware_data_temperature_current

# Control module temperature
chassis_chassis_control_module_hardware_data_temperature_current
Temperature thresholds:
  • Normal: < 65°C
  • Warning: 65-75°C
  • Critical: > 75°C

Fan Speed Monitoring

# Fan speeds (RPM)
chassis_fan_speed_current
Fan speed should remain consistent. Sudden drops may indicate hardware issues.

Container Logs and Monitoring

View Container Logs

# BNG logs
docker logs clab-lab-bng1
docker logs clab-lab-bng2

# Switch and OLT
docker logs clab-lab-switch
docker logs clab-lab-olt

# Transit router
docker logs clab-lab-tx
# gNMIc collector
docker logs -f clab-lab-gnmic

# Prometheus
docker logs clab-lab-prometheus

# Grafana
docker logs clab-lab-grafana
# RADIUS server
docker logs clab-lab-radius

# Subscriber devices
docker logs clab-lab-ont1
docker logs clab-lab-ont2

Container Resource Usage

# Real-time stats for all containers
docker stats

# Stats for specific container
docker stats clab-lab-bng1

# One-time snapshot
docker stats --no-stream

Alerting Best Practices

Recommended Alerts:
  1. Device Down: port_oper_state == 0 on critical links
  2. High CPU: system_cpu_total > 80 for 5 minutes
  3. Memory Exhaustion: Memory utilization > 90%
  4. Interface Errors: Non-zero error rates
  5. BGP Session Down: Loss of established BGP peers
  6. Subscriber Session Failures: Failed authentication attempts
  7. Temperature Alert: Hardware temperature > 70°C
  8. License Expiry: Nokia license approaching expiration

Troubleshooting Monitoring Issues

  1. Check gNMIc is running: docker ps | grep gnmic
  2. Verify gNMIc metrics endpoint: curl http://localhost:9273/metrics
  3. Check Prometheus targets: http://localhost:9090/targets
  4. Review gNMIc logs: docker logs clab-lab-gnmic
  1. Verify Prometheus data source connection in Grafana
  2. Check time range in dashboard (default: last 6 hours)
  3. Run test query in Prometheus UI first
  4. Ensure dashboards are using correct metric names
  1. Verify device gRPC port is accessible: netstat -tuln | grep 57400
  2. Check credentials: admin/lab123
  3. Confirm Docker socket is mounted: docker exec clab-lab-gnmic ls -l /var/run/docker.sock
  4. Review device labels: docker inspect clab-lab-bng1 | grep clab-node-kind

Performance Optimization

Reduce Resource Usage:
  1. Increase scrape interval from 5s to 10s or 15s
  2. Reduce retention period in Prometheus
  3. Disable unused subscriptions in gNMIc config
  4. Limit metric cardinality by filtering unnecessary labels

Build docs developers (and LLMs) love