Documentation Index
Fetch the complete documentation index at: https://mintlify.com/abelperezr/nokia-bng-lab/llms.txt
Use this file to discover all available pages before exploring further.
Dashboard Guide
This guide provides detailed information about the pre-configured Grafana dashboards for monitoring Nokia SROS and SR Linux devices.Dashboard Access
Access Grafana dashboards at:- Click Dashboards icon (four squares) in left sidebar
- Select Browse
- Choose a dashboard:
- SROS Dashboard - BNG monitoring
- SR Linux Telemetry - Switch/OLT monitoring
Anonymous access is enabled - no login required for viewing. Use admin/admin for editing privileges.
SROS Dashboard
Comprehensive monitoring for Nokia SROS BNG routers (BNG1, BNG2). File:configs/grafana/dashboards/SROS-Dashboard.jsonPurpose: Monitor BNG system health, interfaces, subscriber sessions, and routing protocols
Dashboard Layout
The SROS dashboard is organized into collapsible rows:- System Status
- Port Statistics
- BNG Sessions
- VPLS Services
- Routing Protocols
Overview of device health and resourcesPanels:
-
CPU Utilization (Bar Gauge)
- Shows CPU percentage per device
- Color-coded: Green (under 70%), Yellow (70-85%), Red (over 85%)
- Query:
system_cpu
-
Memory Usage (Gauge)
- Displays memory utilization percentage
- Calculated:
(in_use / available) * 100 - Threshold warnings at 80% and 90%
-
System Uptime (Stat)
- Time since last restart
- Useful for tracking reboots
-
Temperature Sensors (Graph)
- Card and module temperatures
- Query:
card_hardware_data_temperature - Alert threshold at 70°C
-
Fan Speeds (Graph)
- Chassis fan RPM monitoring
- Query:
chassis_fan_speed - Detects fan failures (speed = 0)
- Quick health check before maintenance
- Identify overheating or resource exhaustion
- Monitor system stability over time
SROS Dashboard Variables
The dashboard includes dynamic variables:| Variable | Type | Values | Usage |
|---|---|---|---|
$device | Query | bng1, bng2, All | Filter by BNG device |
$port | Query | (port IDs) | Filter by specific port |
$interval | Custom | 5m, 15m, 1h | Rate calculation window |
$service | Query | (VPLS names) | Filter VPLS services |
SROS Dashboard Time Ranges
Recommended time ranges:- Real-time monitoring: Last 5 minutes (5s refresh)
- Active troubleshooting: Last 1 hour (10s refresh)
- Performance review: Last 24 hours
- Capacity planning: Last 7 days
- Incident analysis: Custom range around event
SR Linux Dashboard
Monitoring for Nokia SR Linux switches and routers (Switch, OLT, TX). File:configs/grafana/dashboards/srlinux-telemetry-lite.jsonPurpose: Monitor platform health, interface statistics, and network instances
Dashboard Layout
- Platform Overview
- Interface Statistics
- Subinterfaces
- LAG Status
- Network Instances
Control plane resources and system healthPanels:
-
CPU Usage per Slot (Time Series)
- CPU percentage for each control module
- Query:
platform_control_cpu_total - Labels:
slot,index
-
Memory Utilization (Gauge)
- Physical memory usage percentage
- Query:
- Thresholds: 70% (yellow), 85% (red)
-
Free Memory (Graph)
- Available memory in GB over time
- Query:
platform_control_memory_free / 1073741824 - Trend monitoring for memory leaks
-
Application Resource Usage (Table)
- Per-application CPU and memory
- Query:
system_app_management_application_* - Columns: App Name, CPU%, Memory (MB)
- Identifies resource-hungry applications
- Monitor control plane health
- Detect memory leaks
- Identify misbehaving applications
- Resource capacity planning
SR Linux Dashboard Variables
| Variable | Type | Values | Usage |
|---|---|---|---|
$device | Query | switch, olt, tx, All | Filter by device |
$interface | Query | (interface names) | Filter specific interface |
$network_instance | Query | (VRF names) | Filter network instance |
Dashboard Usage Tips
Comparing Metrics Across Devices
- Set
$devicevariable to All - Use color coding to distinguish devices
- Enable legend for identification
- Example query:
Returns CPU for all devices with
devicelabel
Correlating Events
- Select custom time range around incident
- Open multiple dashboards in tabs
- Use same time range across all dashboards
- Look for correlations:
- CPU spike + BGP route loss?
- Session drops + interface errors?
- Memory growth + application restart?
Zooming In on Issues
- Click and drag on graph to zoom time range
- Click legend entry to hide/show series
- Shift+click legend to isolate single series
- Double-click graph to reset zoom
Exporting Data
- Click panel title → Inspect → Data
- View raw data table
- Click Download CSV to export
- Use for reporting or external analysis
Sharing Dashboard Views
- Set time range and variables as desired
- Click Share dashboard icon (top-right)
- Options:
- Link: Copy URL with current settings
- Snapshot: Create static snapshot (if enabled)
- Export: Download JSON
Creating Alert Annotations
Manually mark events on graphs:- Ctrl+click on graph at event time
- Select Add annotation
- Enter description (e.g., “Config change deployed”)
- Annotation appears on all panels
Annotations are per-dashboard and not persistent in this lab configuration.
Performance Optimization
Dashboard Loading Slowly
Solutions:- Reduce time range (e.g., 1h instead of 7d)
- Limit variable selections (specific device vs. All)
- Collapse unused rows
- Increase
$intervalvariable value
Too Many Series in Graph
Solutions:- Use
topk()to limit to top N: - Filter by specific labels:
- Use aggregation:
Query Taking Too Long
Solutions:- Reduce query range (use
[$interval]instead of hardcoded) - Add more specific label filters
- Use
irate()instead ofrate()for recent data - Consider creating recording rules in Prometheus
Common Dashboard Workflows
Daily Health Check
- Open SROS Dashboard
- Check System Status row:
- CPU < 80%
- Memory < 85%
- No temperature alarms
- Verify Port Statistics:
- All expected ports UP
- No error counters increasing
- Review BNG Sessions:
- Session count within normal range
- No failed session spikes
Troubleshooting Subscriber Issue
- Open SROS Dashboard
- Set time range to incident window
- Check BNG Sessions row:
- Session drops?
- Failed authentications?
- Review Port Statistics:
- Interface errors on subscriber-facing ports?
- Traffic patterns abnormal?
- Check VPLS Services:
- SAP down?
- Service state issues?
Capacity Planning
- Set time range to Last 30 days
- SROS Dashboard → System Status:
- CPU trend (linear regression)
- Memory growth rate
- BNG Sessions:
- Peak session count
- Growth rate (sessions/day)
- Port Statistics:
- Max interface utilization
- 95th percentile traffic rates
- Export data for external analysis
Incident Analysis
- Identify incident time window
- Open all dashboards in separate tabs
- Set same custom time range on all
- Look for anomalies:
- CPU/memory spikes
- Route count changes
- Interface state changes
- Error rate increases
- Take screenshots for incident report
- Export relevant panel data
Next Steps
Available Metrics
Explore all metrics used in dashboards
Customize Grafana
Learn to create custom dashboards