Skip to main content

Overview

Chronoverse provides comprehensive monitoring capabilities through Grafana LGTM (Loki, Grafana, Tempo, Mimir) stack integration and built-in health check mechanisms.

Grafana LGTM Stack

The observability stack runs as a containerized service and collects telemetry from all components.

Configuration

lgtm:
  image: grafana/otel-lgtm:0.11.10
  container_name: lgtm
  environment:
    GF_AUTH_ANONYMOUS_ENABLED: true
    GF_AUTH_ANONYMOUS_ORG_ROLE: Admin
  ports:
    - "3000:3000"  # Grafana UI
    - "4317:4317"  # OTLP gRPC endpoint

Accessing Grafana

In development mode, access Grafana at http://localhost:3000. In production, the port is not exposed externally.
Grafana is configured with anonymous authentication enabled by default. For production deployments, configure proper authentication.

Health Checks

gRPC Service Health Checks

All gRPC services implement health check probes using grpc-health-probe:
healthcheck:
  test: |
    if [ "$GRPC_TLS_ENABLED" = "true" ]; then
      /bin/grpc-health-probe -addr=localhost:50051 \
        -connect-timeout 250ms \
        -rpc-timeout 250ms \
        -tls \
        -tls-ca-cert certs/ca/ca.crt \
        -tls-client-cert certs/users-service/users-service.crt \
        -tls-client-key certs/users-service/users-service.key \
        -tls-server-name=users-service \
        -rpc-header=Audience:grpc_probe \
        -rpc-header=Role:admin
    else
      /bin/grpc-health-probe -addr=localhost:50051 \
        -connect-timeout 250ms \
        -rpc-timeout 250ms \
        -service=users-service \
        -rpc-header=Audience:grpc_probe \
        -rpc-header=Role:admin
    fi
  interval: 10s
  timeout: 5s
  retries: 5
  start_period: 30s

Service Health Check Endpoints

ServicePortProtocol
users-service50051gRPC
workflows-service50052gRPC
jobs-service50053gRPC
notifications-service50054gRPC
analytics-service50055gRPC

Database Health Checks

PostgreSQL

healthcheck:
  test:
    [
      "CMD-SHELL",
      "psql 'host=0.0.0.0 user=primary dbname=postgres sslmode=verify-full sslrootcert=/certs/ca/ca.crt sslcert=/certs/clients/client.crt sslkey=/certs/clients/client.key' -c 'SELECT 1;'",
    ]
  interval: 10s
  timeout: 5s
  retries: 5
  start_period: 5s

ClickHouse

healthcheck:
  test:
    [
      "CMD-SHELL",
      "clickhouse-client --secure --host=localhost --port=9440 --user=chronoverse-client --password=chronoverse --query 'SELECT 1'",
    ]
  interval: 10s
  timeout: 5s
  retries: 10
  start_period: 5s

Redis

healthcheck:
  test:
    [
      "CMD-SHELL",
      "redis-cli --tls --cert /certs/clients/client.crt --key /certs/clients/client.key --cacert /certs/ca/ca.crt -p 6379 ping | grep PONG",
    ]
  interval: 10s
  timeout: 5s
  retries: 10
  start_period: 5s

Service Dependencies

Chronoverse uses Docker Compose’s dependency management to ensure services start in the correct order:
users-service:
  depends_on:
    postgres:
      condition: service_healthy
    redis:
      condition: service_healthy
    lgtm:
      condition: service_started
    init-certs:
      condition: service_completed_successfully
    init-database-migration:
      condition: service_completed_successfully
Services will not start until all dependencies are healthy. Ensure adequate start_period values for services with longer initialization times.

Monitoring Best Practices

1

Configure OTLP Endpoint

Ensure all services point to the LGTM stack:
OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
2

Set Up Grafana Dashboards

Access Grafana and import dashboards for:
  • Service latency and throughput
  • Error rates
  • Resource utilization
  • Database connection pools
3

Configure Alerts

Set up alerts for:
  • Service health check failures
  • High error rates (>5%)
  • Resource exhaustion
  • Database connection pool saturation
4

Monitor Logs

Use Loki to query logs across all services:
{service_name="users-service"} |= "error"

Port Reference

Exposed Ports (Development)

ServicePortPurpose
Grafana3000Monitoring UI
OTLP4317Telemetry ingestion
PostgreSQL5432Database
ClickHouse9440Analytics database
Redis6379Cache
Kafka9094Message broker
MeiliSearch7700Search engine
In production mode, only port 80 (nginx) is exposed. All other services communicate over the internal Docker network.

HTTP Server Monitoring

The HTTP server includes compression and CORS middleware with built-in logging:
srv.httpServer.Handler = srv.withOtelMiddleware(
    srv.withCORSMiddleware(
        srv.withCompressionMiddleware(router),
    ),
)
Monitoring is enabled at:
  • internal/server/server.go:144
All HTTP requests are traced with OpenTelemetry for latency and error tracking.

Build docs developers (and LLMs) love