Skip to main content

Overview

Viction provides comprehensive metrics collection for monitoring node health, performance, and resource usage. The metrics system is based on the go-metrics library and can export data to various monitoring systems.

Enabling Metrics

Basic Metrics

Enable standard metrics collection:
tomo --metrics
This enables basic health and performance metrics with minimal overhead.

Expensive Metrics

Enable detailed metrics including resource-intensive measurements:
tomo --metrics --metrics.expensive
Expensive metrics can impact node performance. Only enable on nodes with sufficient resources and when detailed monitoring is required.

HTTP Metrics Endpoint

Expose metrics via HTTP for scraping by monitoring systems:
tomo --metrics --metrics.addr 127.0.0.1 --metrics.port 6060
Access metrics:
curl http://127.0.0.1:6060/debug/metrics
Options:
  • --metrics.addr: HTTP server listening interface (default: 127.0.0.1)
  • --metrics.port: HTTP server listening port (default: 6060)
Never expose the metrics endpoint to the public internet. Always bind to localhost or use firewall rules to restrict access to trusted monitoring systems.

System Metrics

Viction automatically collects system-level metrics every 3 seconds:

CPU Metrics

MetricDescription
system/cpu/sysloadSystem-wide CPU load
system/cpu/syswaitSystem-wide CPU wait time
system/cpu/procloadProcess CPU load
system/cpu/threadsNumber of OS threads
system/cpu/goroutinesNumber of Go goroutines

Memory Metrics

MetricDescription
system/memory/pausesGC pause time
system/memory/allocsMemory allocations
system/memory/freesMemory deallocations
system/memory/heldMemory held by heap
system/memory/usedMemory currently in use

Disk Metrics

MetricDescription
system/disk/readcountNumber of disk reads
system/disk/readdataBytes read from disk
system/disk/readbytesTotal bytes read (counter)
system/disk/writecountNumber of disk writes
system/disk/writedataBytes written to disk
system/disk/writebytesTotal bytes written (counter)

Runtime Metrics

Go runtime metrics provide insight into application performance:

Memory Statistics

MetricDescription
runtime.MemStats.AllocBytes of allocated heap objects
runtime.MemStats.TotalAllocCumulative bytes allocated
runtime.MemStats.SysTotal bytes from OS
runtime.MemStats.MallocsNumber of heap allocations
runtime.MemStats.FreesNumber of heap deallocations
runtime.MemStats.HeapAllocHeap bytes allocated and in use
runtime.MemStats.HeapSysHeap bytes from OS
runtime.MemStats.HeapIdleIdle heap bytes
runtime.MemStats.HeapInuseIn-use heap bytes
runtime.MemStats.HeapReleasedHeap bytes released to OS
runtime.MemStats.HeapObjectsNumber of heap objects

Garbage Collection

MetricDescription
runtime.MemStats.NumGCNumber of completed GC cycles
runtime.MemStats.PauseNsGC pause durations (histogram)
runtime.MemStats.PauseTotalNsCumulative GC pause time
runtime.MemStats.GCCPUFractionFraction of CPU used by GC
runtime.MemStats.LastGCTime of last GC
runtime.MemStats.NextGCTarget heap size for next GC

Runtime Details

MetricDescription
runtime.NumGoroutineNumber of goroutines
runtime.NumThreadNumber of OS threads
runtime.NumCgoCallNumber of CGO calls
runtime.ReadMemStatsTime to read memory stats

Blockchain Metrics

Monitor blockchain-specific operations:

Block Processing

Track block import and processing:
  • Block import time
  • Block size
  • Transaction count per block
  • Gas used per block
  • Uncle rate

Transaction Pool

Monitor mempool activity:
  • Pending transactions
  • Queued transactions
  • Transaction replacement rate
  • Pool size limits

P2P Networking

Network connectivity metrics:
  • Active peer count
  • Peer connect/disconnect events
  • Inbound/outbound connections
  • Data sent/received per peer
  • Protocol handshake success rate

Consensus

PoSV consensus metrics:
  • Validator status
  • Block signing success rate
  • Missed blocks
  • Checkpoint events

Metric Types

Viction uses several metric types:

Counter

Monotonically increasing values:
counter.Inc(1)  // Increment by 1

Gauge

Values that can increase or decrease:
gauge.Update(42)  // Set to specific value

Meter

Measures rate of events:
meter.Mark(n)  // Record n events
Provides:
  • Count
  • Mean rate
  • 1/5/15 minute moving average

Timer

Measures duration and rate:
timer.UpdateSince(startTime)
Provides:
  • Count
  • Mean/min/max duration
  • Percentiles (50th, 75th, 95th, 99th, 99.9th)
  • Rate metrics

Histogram

Measures distribution of values:
histogram.Update(value)
Provides:
  • Count
  • Mean/min/max
  • Percentiles

Monitoring Integrations

Prometheus

Export metrics to Prometheus:
  1. Enable metrics HTTP endpoint:
tomo --metrics --metrics.addr 0.0.0.0 --metrics.port 6060
  1. Configure Prometheus scraping (prometheus.yml):
scrape_configs:
  - job_name: 'viction'
    static_configs:
      - targets: ['localhost:6060']
    metrics_path: '/debug/metrics/prometheus'
    scrape_interval: 15s
  1. Query metrics in Prometheus:
rate(system_disk_writebytes[5m])
rate(system_memory_allocs[1m])

Grafana

Visualize metrics with Grafana:
  1. Add Prometheus data source
  2. Import Viction dashboard template
  3. Create custom dashboards for your needs
Example dashboard panels:
  • CPU and memory usage over time
  • Block height and sync status
  • Transaction pool size
  • Peer count
  • Disk I/O rates

InfluxDB

The metrics library supports InfluxDB export for time-series storage.

Graphite

Export to Graphite for legacy monitoring systems.

Monitoring Best Practices

Alert Thresholds

Set alerts for critical conditions: High priority:
  • Node not syncing (block height not increasing)
  • Peer count below minimum (< 3)
  • Disk space below 10%
  • Memory usage above 90%
  • Masternode missing blocks
Medium priority:
  • High CPU usage (> 80% sustained)
  • Large transaction pool (> 1000 pending)
  • Slow block processing
  • GC pause time increasing
Low priority:
  • Peer churn rate high
  • Transaction replacement rate high

Monitoring Queries

Check sync status:
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://localhost:8545
Check peer count:
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
  http://localhost:8545
Check block number:
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://localhost:8545

Resource Planning

Monitor trends to plan capacity:
  1. Database growth rate: Track disk usage over time
  2. Memory requirements: Monitor peak memory usage
  3. CPU utilization: Identify bottlenecks
  4. Network bandwidth: Plan for peak loads

Health Checks

Implement automated health checks:
#!/bin/bash
# health-check.sh

# Check if process is running
if ! pgrep -x "tomo" > /dev/null; then
    echo "CRITICAL: Node process not running"
    exit 2
fi

# Check if syncing
SYNC=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://localhost:8545 | jq -r '.result')

if [ "$SYNC" != "false" ]; then
    echo "WARNING: Node is syncing"
    exit 1
fi

# Check peer count
PEERS=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
  http://localhost:8545 | jq -r '.result')

PEER_COUNT=$((16#${PEERS#0x}))

if [ "$PEER_COUNT" -lt 3 ]; then
    echo "WARNING: Low peer count: $PEER_COUNT"
    exit 1
fi

echo "OK: Node healthy, $PEER_COUNT peers"
exit 0

Log Analysis

Monitor logs for important events: Successful block creation (masternode):
grep "Successfully sealed new block" /var/log/viction/node.log
Block import:
grep "Imported new chain segment" /var/log/viction/node.log
Peer connections:
grep -E "Peer connected|Peer disconnected" /var/log/viction/node.log
Errors:
grep -i error /var/log/viction/node.log

Performance Tuning

Use metrics to optimize performance:
  1. High GC pause time: Increase GOGC environment variable
  2. High memory usage: Reduce cache sizes
  3. Slow disk I/O: Use faster storage (SSD/NVMe)
  4. CPU bottlenecks: Increase worker threads
  5. Network saturation: Adjust max peers

Monitoring Checklist

  • Metrics collection enabled
  • HTTP metrics endpoint secured
  • Prometheus/monitoring system configured
  • Grafana dashboards created
  • Critical alerts configured
  • Health check script deployed
  • Log aggregation configured
  • Capacity planning metrics tracked
  • Alert escalation procedures documented
  • On-call rotation established

Build docs developers (and LLMs) love