Overview
The Monitoring feature provides Prometheus-compatible metrics endpoints for tracking system performance, request patterns, and operational health. Integrate with your observability stack for real-time monitoring and alerting.
Prometheus Compatible Standard Prometheus text format metrics
Request Tracking Total request counters and performance metrics
Metrics Endpoint
Expose metrics in Prometheus text format:
GET /api/monitoring/metrics
Response:
# HELP requests_total Total number of requests
# TYPE requests_total counter
requests_total 1547.0
The metrics endpoint does not require authentication and should be accessible to your monitoring infrastructure.
Available Metrics
Request Counter
REQUESTS = Counter( "requests_total" , "Total number of requests" )
Metric Name: requests_total
Type: Counter
Description: Tracks the total number of requests to the monitoring endpoint
Source: ~/workspace/source/app/features/monitoring/presentation/routes.py:7
Prometheus Integration
Scrape Configuration
Add to your prometheus.yml:
scrape_configs :
- job_name : 'water-quality-api'
scrape_interval : 15s
static_configs :
- targets : [ 'api.example.com:443' ]
metrics_path : '/api/monitoring/metrics'
scheme : https
Example Queries
Total requests:
Request rate (per second):
Request increase over time:
increase(requests_total[1h])
Grafana Dashboard
Create visualizations for your metrics:
{
"panels" : [
{
"title" : "Total Requests" ,
"targets" : [
{
"expr" : "requests_total" ,
"legendFormat" : "Total Requests"
}
],
"type" : "stat"
},
{
"title" : "Request Rate" ,
"targets" : [
{
"expr" : "rate(requests_total[5m])" ,
"legendFormat" : "Requests/sec"
}
],
"type" : "graph"
}
]
}
Health Check
Basic health check endpoint:
Response:
{
"message" : "Monitoring Home"
}
This endpoint increments the requests_total counter on each call.
Implementation Details
Prometheus Client
The monitoring feature uses the official Prometheus Python client:
from prometheus_client import Counter, generate_latest
from fastapi import Response
REQUESTS = Counter( "requests_total" , "Total number of requests" )
@monitoring_router.get ( "/metrics" )
async def metrics ():
data = generate_latest()
return Response( content = data, media_type = "text/plain" )
Custom Metrics
Extend monitoring by adding custom metrics:
from prometheus_client import Counter, Histogram, Gauge
# Counter: monotonically increasing value
ALERTS_SENT = Counter(
"alerts_sent_total" ,
"Total number of alerts sent" ,
[ "alert_type" ] # Labels
)
# Histogram: distribution of values
ANALYSIS_DURATION = Histogram(
"analysis_duration_seconds" ,
"Time spent processing analysis" ,
[ "analysis_type" ]
)
# Gauge: value that can go up or down
ACTIVE_METERS = Gauge(
"active_meters_count" ,
"Number of currently active meters"
)
# Usage
ALERTS_SENT .labels( alert_type = "dangerous" ).inc()
with ANALYSIS_DURATION .labels( analysis_type = "prediction" ).time():
# Process analysis
pass
ACTIVE_METERS .set( 42 )
Best Practices
Meaningful Names Use descriptive metric names following Prometheus naming conventions
Appropriate Types Choose the right metric type (Counter, Gauge, Histogram, Summary)
Strategic Labels Add labels for dimensions but avoid high cardinality
Regular Scraping Configure reasonable scrape intervals (10-60 seconds)
Alerting Rules
Example Prometheus alerting rules:
groups :
- name : water_quality_api
interval : 30s
rules :
- alert : HighRequestRate
expr : rate(requests_total[5m]) > 100
for : 5m
labels :
severity : warning
annotations :
summary : "High request rate detected"
description : "Request rate is {{ $value }} requests/sec"
- alert : NoRecentRequests
expr : rate(requests_total[10m]) == 0
for : 10m
labels :
severity : critical
annotations :
summary : "No requests received"
description : "API may be down or unreachable"
Expanding Metrics
Recommended additional metrics for production:
Request latency by endpoint
Error rate and types
Active WebSocket connections
Database query performance
Alerts sent by type
Analysis created/completed
Active users and workspaces
Meters registered and connected
Memory usage
CPU utilization
Database connections
Cache hit rates
Average sensor values by type
Anomaly detection rates
Data collection frequency
Missing data percentages
Security Considerations
Metrics endpoints should be accessible to monitoring infrastructure but protected from public access. Consider:
Network-level access controls
VPN or private network access
IP allowlisting
Separate authentication for metrics
Example: Complete Monitoring Setup
# extended_monitoring.py
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from fastapi import APIRouter, Response
import time
monitoring_router = APIRouter( prefix = "/monitoring" , tags = [ "Monitoring" ])
# Define metrics
REQUESTS = Counter( "requests_total" , "Total requests" )
ALERTS = Counter( "alerts_sent_total" , "Total alerts sent" , [ "type" ])
ANALYSIS = Counter( "analysis_created_total" , "Total analyses created" , [ "type" ])
ANALYSIS_DURATION = Histogram(
"analysis_duration_seconds" ,
"Analysis processing time" ,
[ "type" ]
)
ACTIVE_METERS = Gauge( "active_meters" , "Currently connected meters" )
ACTIVE_CONNECTIONS = Gauge( "websocket_connections" , "Active WebSocket connections" )
@monitoring_router.get ( "/" )
async def home ():
REQUESTS .inc()
return { "message" : "Monitoring Home" , "timestamp" : time.time()}
@monitoring_router.get ( "/metrics" )
async def metrics ():
data = generate_latest()
return Response( content = data, media_type = "text/plain" )
# Helper functions for other parts of the application
def track_alert ( alert_type : str ):
ALERTS .labels( type = alert_type).inc()
def track_analysis ( analysis_type : str , duration : float ):
ANALYSIS .labels( type = analysis_type).inc()
ANALYSIS_DURATION .labels( type = analysis_type).observe(duration)
def update_active_meters ( count : int ):
ACTIVE_METERS .set(count)
def update_websocket_connections ( count : int ):
ACTIVE_CONNECTIONS .set(count)
Monitoring Dashboard Example
Visualize your metrics with this Grafana dashboard JSON:
{
"dashboard" : {
"title" : "Water Quality API Monitoring" ,
"panels" : [
{
"title" : "Request Rate" ,
"targets" : [{ "expr" : "rate(requests_total[5m])" }],
"type" : "graph"
},
{
"title" : "Active Meters" ,
"targets" : [{ "expr" : "active_meters" }],
"type" : "stat"
},
{
"title" : "Alerts Sent by Type" ,
"targets" : [{ "expr" : "rate(alerts_sent_total[1h])" }],
"type" : "graph"
},
{
"title" : "Analysis Duration (p95)" ,
"targets" : [{
"expr" : "histogram_quantile(0.95, rate(analysis_duration_seconds_bucket[5m]))"
}],
"type" : "graph"
}
]
}
}