Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Shyamalp16/CloudGaming/llms.txt
Use this file to discover all available pages before exploring further.
Overview
CloudGaming provides multi-layer monitoring across WebRTC streaming, signaling infrastructure, and host health. This guide covers all available metrics, health checks, and monitoring best practices.
WebRTC Statistics
Real-Time Transport Metrics
The Go/Pion WebRTC implementation tracks comprehensive transport statistics:
Packet Loss and Retransmission
// Tracked in gortc_main/main.go:236-245
type WebRTCStats struct {
nackCount uint32 // NACK (negative acknowledgment) count
pliCount uint32 // Picture Loss Indication count
twccCount uint32 // Transport-wide congestion control count
pacerQueueLength uint32 // Pacer queue depth
sendBitrateKbps uint32 // Estimated send bitrate
}
Available Metrics:
- Packet Loss - Percentage of lost RTP packets
- RTT (Round-Trip Time) - Network latency in milliseconds
- Jitter - Packet arrival time variance
- NACK Count - Number of retransmission requests
- PLI Count - Number of keyframe requests
- Send Bitrate - Current video bitrate in kbps
- Pacer Queue Length - Number of frames waiting to send
Stats Monitoring Implementation
// Stats updated every 500ms (gortc_main/main.go:248-268)
func startStatsMonitoring() {
ticker := time.NewTicker(500 * time.Millisecond)
defer ticker.Stop()
audioHealthTicker := time.NewTicker(5 * time.Second)
defer audioHealthTicker.Stop()
for {
select {
case <-ticker.C:
updatePacerQueueLength()
case <-audioHealthTicker.C:
reportAudioQueueHealth()
}
}
}
RTCP Feedback
RTCP (RTP Control Protocol) provides real-time feedback:
// Callback signature (gortc_main/main.go:16-19)
typedef void (*WebRTCStatsCallback)(
double packetLoss, double rtt, double jitter,
unsigned int nackCount, unsigned int pliCount,
unsigned int twccCount, unsigned int pacerQueueLength,
unsigned int sendBitrateKbps
);
Log Example:
[Go/Pion] WebRTC Stats: loss=0.5%, rtt=28ms, jitter=2.1ms,
nack=12, pli=1, bitrate=8500kbps, queue=2
Audio Queue Monitoring
Queue Health Metrics
Audio queue depth indicates network congestion:
// Reported every 5 seconds (gortc_main/main.go:486-532)
func reportAudioQueueHealth() {
avgDepth := getAverageAudioQueueDepth()
currentDepth := len(audioSendQueue)
healthStatus := "GOOD"
if avgDepth > 2.0 {
healthStatus = "WARNING"
}
if avgDepth > 2.8 {
healthStatus = "CRITICAL"
}
}
Health Status Thresholds:
- GOOD: Average queue depth < 2.0 packets
- WARNING: Average queue depth > 2.0 packets
- CRITICAL: Average queue depth > 2.8 packets
Log Example:
[Go/Pion] Audio Queue Health [WARNING]: current=3, avg=2.4,
min=1, max=4, samples=10
[Go/Pion] ⚠️ Audio queue consistently congested -
consider bitrate reduction
Buffer Pool Health
Memory Management Monitoring
The tiered buffer pool tracks allocation efficiency:
// Health check (gortc_main/main.go:310-344)
func checkBufferPoolHealth() {
totalHits := sum(sampleBufPool.hits)
totalMisses := sum(sampleBufPool.misses)
hitRate := float64(totalHits) / float64(totalHits + totalMisses)
if hitRate < 85.0 {
log.Printf("⚠️ Low hit rate %.1f%%", hitRate)
}
}
Performance Indicators:
- Hit Rate 95%+: Excellent - minimal allocations
- Hit Rate 90-95%: Good - some allocations expected
- Hit Rate 80-90%: Moderate - consider pool tuning
- Hit Rate below 80%: Poor - high GC pressure
Log Example:
[Go/Pion] Buffer Pool Statistics:
Tier 4 (4096 bytes): 1523 hits, 12 misses, 12 allocs (99.2% hit rate)
Tier 7 (32768 bytes): 8901 hits, 45 misses, 45 allocs (99.5% hit rate)
Overall: 15234 requests, 98.7% hit rate, 89 total allocations
✅ Excellent performance - minimal allocations
Signaling Server Metrics
Prometheus Metrics Endpoint
The signaling server exposes metrics at /metrics:
curl http://localhost:3002/metrics
Available Metrics
Connection Metrics:
# Active WebSocket connections
signaling_active_connections 42
# Rooms with local connections
signaling_local_rooms 8
Message Processing:
# Total messages forwarded
signaling_messages_forwarded_total 15234
# Schema validation rejections
signaling_schema_rejections_total 3
# Rate limit drops
signaling_rate_limit_drops_total 12
# Backpressure connection closes
signaling_backpressure_closes_total 0
Redis Health:
# Redis connection status (1=up, 0=down)
signaling_redis_up 1
# Circuit breaker status (1=open, 0=closed)
signaling_circuit_breaker_open 0
# Redis command latency histogram
signaling_redis_cmd_latency_seconds_bucket{le="0.005"} 1234
signaling_redis_cmd_latency_seconds_bucket{le="0.01"} 1240
signaling_redis_cmd_latency_seconds_bucket{le="0.025"} 1245
Fanout Performance:
# Local message fanout latency
signaling_fanout_latency_seconds_bucket{le="0.001"} 5678
signaling_fanout_latency_seconds_bucket{le="0.005"} 5690
Implementation Reference
See Server/metrics.js:1-117 for the complete metrics implementation.
Matchmaker Monitoring
Host Health Tracking
The matchmaker monitors host heartbeats:
Heartbeat Endpoint:
POST /api/host/heartbeat
Authorization: Bearer <HOST_SECRET>
{
"hostId": "550e8400-e29b-41d4-a716-446655440000",
"roomId": "game-room-1",
"region": "us-west",
"status": "idle",
"capacity": 1,
"availableSlots": 1
}
Response:
{
"success": true,
"ttl": 30
}
Host TTL Monitoring
Response:
[
{
"hostId": "550e8400-e29b-41d4-a716-446655440000",
"ttlSeconds": 28
}
]
Stale Host Pruning:
// Runs every 10 seconds (mm_server/Matchmaker.js:189-206)
async function pruneStaleIdleHosts() {
const stale = [];
const ids = await redisClient.sMembers('idle_hosts');
for (const id of ids) {
const ttl = await redisClient.ttl(`host:${id}`);
if (ttl === -2) { // Key expired
stale.push(id);
}
}
if (stale.length > 0) {
await redisClient.sRem('idle_hosts', stale);
}
}
Health Check Endpoints
Signaling Server
Liveness Probe:
GET /healthz
# Returns: 200 OK
Readiness Probe:
GET /readyz
# Returns: 200 "ready" if Redis is connected
# Returns: 503 "not-ready" if Redis is down or draining
Matchmaker
Health Endpoints:
GET /healthz # Liveness
GET /readyz # Readiness
GET /health # General health
GET / # Returns "ok"
All return 200 OK immediately to prevent Railway from killing the container.
Host Configuration Monitoring
Monitor these key settings from config.json:
Video Configuration
{
"video": {
"fps": 60,
"bitrateStart": 8000000,
"bitrateMin": 8000000,
"bitrateMax": 12000000,
"preset": "p2",
"rc": "cbr"
}
}
Capture Settings
{
"capture": {
"mmcss": { "enable": true, "priority": 4 },
"maxQueueDepth": 2,
"skipUnchanged": true
}
}
Audio Configuration
{
"audio": {
"bitrate": 80000,
"frameSizeMs": 10,
"enableFec": true,
"latency": {
"enforceSingleFrameBuffering": true,
"targetOneWayLatencyMs": 40
}
}
}
Redis Monitoring
Circuit Breaker
Protects against Redis failures:
// Server/ScalableSignalingServer.js:68-84
function noteRedisFailure() {
redisFailureCount += 1;
if (redisFailureCount >= config.cbErrorThreshold) {
redisCircuitOpenUntil = Date.now() + config.cbOpenMs;
setCircuitBreakerOpen(true);
}
}
When circuit opens:
- New connections rejected with
1013 Service unavailable
- Existing connections continue working
- Circuit auto-closes after timeout
Connection Status
// Check Redis connectivity
const pong = await redisClient.ping();
if (pong === 'PONG') {
// Redis is healthy
}
Monitoring Best Practices
Alerting Thresholds
Critical Alerts:
- WebRTC packet loss > 5%
- RTT > 150ms for sustained period
- Audio queue depth > 2.8 (CRITICAL)
- Buffer pool hit rate < 80%
- Redis circuit breaker open
- Signaling server Redis disconnected
Warning Alerts:
- WebRTC packet loss > 2%
- RTT > 100ms
- Audio queue depth > 2.0 (WARNING)
- Buffer pool hit rate < 90%
- Host heartbeat TTL < 10 seconds
- Rate limit drops increasing
Log Aggregation
Key Log Patterns:
# WebRTC stats
grep "WebRTC Stats" logs.txt
# Audio health
grep "Audio Queue Health" logs.txt
# Buffer pool performance
grep "Buffer Pool" logs.txt
# Redis errors
grep "Redis" logs.txt | grep -E "error|failed"
# Connection issues
grep -E "ICE|connection state" logs.txt
Grafana Dashboard Example
Panels to Include:
- Active connections (signaling_active_connections)
- Message throughput (rate(signaling_messages_forwarded_total[1m]))
- Redis latency (signaling_redis_cmd_latency_seconds)
- WebRTC packet loss percentage
- Audio queue depth over time
- Buffer pool hit rate
- Host heartbeat count
Next Steps