Skip to main content

Overview

Chronoverse is designed for horizontal scalability with multiple worker replicas, resource-based limits, and performance optimizations for production workloads.

Horizontal Scaling Architecture

Chronoverse uses Docker Compose’s deploy.replicas feature to run multiple instances of worker services:
┌─────────────────────┐
│   Load Balancer     │
│      (nginx)        │
└──────────┬──────────┘

     ┌─────┴─────┐
     ▼           ▼
  Server      Server
     │           │
     └─────┬─────┘

     ┌─────┴─────────────────┐
     ▼                       ▼
┌──────────────┐      ┌──────────────┐
│   Workers    │      │   Workers    │
│  (Replica 1) │      │  (Replica 2) │
└──────────────┘      └──────────────┘

Worker Replicas

Low-Resource Workers

Used for lightweight background tasks:
low-resources-workers-limit: &low-resources-workers-limit
  deploy:
    replicas: 2
    resources:
      limits:
        cpus: "0.5"    # 500m CPU
        memory: 2G     # 2 GB
      reservations:
        cpus: "0.25"   # 250m CPU
        memory: 1G     # 1 GB
Applied to:
  • scheduling-worker: Workflow scheduling and orchestration
  • workflow-worker: Workflow execution coordination
  • joblogs-processor: Log processing and indexing
  • analytics-processor: Analytics data aggregation

High-Resource Workers

Used for compute-intensive job execution:
high-resources-workers-limit: &high-resources-workers-limit
  deploy:
    replicas: 2
    resources:
      limits:
        cpus: "2"      # 2 CPU
        memory: 4G     # 4 GB
      reservations:
        cpus: "1"      # 1 CPU
        memory: 2G     # 2 GB
Applied to:
  • execution-worker: Docker container job execution
Source: compose.prod.yaml:28

Resource Limits

Service Resource Allocation

Services have conservative resource limits:
services-limit: &services-limit
  deploy:
    resources:
      limits:
        cpus: "0.25"   # 250m CPU
        memory: 256M   # 256 MB
      reservations:
        cpus: "0.1"    # 100m CPU
        memory: 128M   # 128 MB
Applied to:
  • users-service
  • workflows-service
  • jobs-service
  • notifications-service
  • analytics-service
  • server
  • docker-proxy
Source: compose.prod.yaml:16

Database Resource Allocation

Databases receive higher resource allocation:
database-limits: &database-limits
  deploy:
    resources:
      limits:
        cpus: "1"      # 1 CPU
        memory: 1G     # 1 GB
      reservations:
        cpus: "0.5"    # 500m CPU
        memory: 512M   # 512 MB
Applied to:
  • PostgreSQL
  • ClickHouse
  • Redis
  • Kafka
Source: compose.prod.yaml:5

Connection Pooling

PostgreSQL Connection Pool

Optimized for multiple service connections:
type Postgres struct {
    Host        string        `envconfig:"POSTGRES_HOST" default:"localhost"`
    Port        int           `envconfig:"POSTGRES_PORT" default:"5432"`
    User        string        `envconfig:"POSTGRES_USER" default:"postgres"`
    Password    string        `envconfig:"POSTGRES_PASSWORD" default:"postgres"`
    Database    string        `envconfig:"POSTGRES_DB" default:"chronoverse"`
    MaxConns    int32         `envconfig:"POSTGRES_MAX_CONNS" default:"10"`
    MinConns    int32         `envconfig:"POSTGRES_MIN_CONNS" default:"5"`
    MaxConnLife time.Duration `envconfig:"POSTGRES_MAX_CONN_LIFE" default:"1h"`
    MaxConnIdle time.Duration `envconfig:"POSTGRES_MAX_CONN_IDLE" default:"30m"`
    DialTimeout time.Duration `envconfig:"POSTGRES_DIAL_TIMEOUT" default:"5s"`
}
Source: internal/config/config.go:24 Production Recommendations:
  • POSTGRES_MAX_CONNS: 20-50 per service
  • POSTGRES_MIN_CONNS: 5-10 per service
  • POSTGRES_MAX_CONN_LIFE: 1h
  • POSTGRES_MAX_CONN_IDLE: 30m

ClickHouse Connection Pool

Configured for analytics workloads:
type ClickHouse struct {
    Hosts           []string      `envconfig:"CLICKHOUSE_HOSTS" default:"localhost:9000"`
    Database        string        `envconfig:"CLICKHOUSE_DATABASE" default:"default"`
    Username        string        `envconfig:"CLICKHOUSE_USERNAME" default:"default"`
    Password        string        `envconfig:"CLICKHOUSE_PASSWORD" default:""`
    MaxOpenConns    int           `envconfig:"CLICKHOUSE_MAX_OPEN_CONNS" default:"10"`
    MaxIdleConns    int           `envconfig:"CLICKHOUSE_MAX_IDLE_CONNS" default:"5"`
    ConnMaxLifetime time.Duration `envconfig:"CLICKHOUSE_CONN_MAX_LIFETIME" default:"1h"`
    DialTimeout     time.Duration `envconfig:"CLICKHOUSE_DIAL_TIMEOUT" default:"5s"`
}
Source: internal/config/config.go:44

Redis Connection Pool

Optimized for caching and session storage:
type Redis struct {
    Host                     string        `envconfig:"REDIS_HOST" default:"localhost"`
    Port                     int           `envconfig:"REDIS_PORT" default:"6379"`
    Password                 string        `envconfig:"REDIS_PASSWORD" default:""`
    DB                       int           `envconfig:"REDIS_DB" default:"0"`
    PoolSize                 int           `envconfig:"REDIS_POOL_SIZE" default:"10"`
    MinIdleConns             int           `envconfig:"REDIS_MIN_IDLE_CONNS" default:"5"`
    ReadTimeout              time.Duration `envconfig:"REDIS_READ_TIMEOUT" default:"5s"`
    WriteTimeout             time.Duration `envconfig:"REDIS_WRITE_TIMEOUT" default:"5s"`
    MaxMemory                string        `envconfig:"REDIS_MAX_MEMORY" default:"100mb"`
    EvictionPolicy           string        `envconfig:"REDIS_EVICTION_POLICY" default:"allkeys-lru"`
    EvictionPolicySampleSize int           `envconfig:"REDIS_EVICTION_POLICY_SAMPLE_SIZE" default:"5"`
}
Source: internal/config/config.go:62 Production Recommendations:
  • REDIS_POOL_SIZE: 20-50 per service
  • REDIS_MIN_IDLE_CONNS: 10-20 per service
  • REDIS_MAX_MEMORY: Based on available memory (e.g., 4gb)
  • REDIS_EVICTION_POLICY: allkeys-lru for cache workloads

Kafka Scaling

Consumer Groups

Workers use consumer groups for load distribution:
workflow-worker:
  environment:
    KAFKA_BROKERS: kafka:9094
    KAFKA_CONSUMER_GROUP: workflow-worker
    # Multiple replicas share the same consumer group
  deploy:
    replicas: 2
Consumer groups:
  • workflow-worker: Workflow execution tasks
  • execution-worker: Job execution tasks
  • joblogs-processor: Log processing tasks
  • analytics-processor: Analytics aggregation tasks

Partition Strategy

Scale Kafka by increasing partition count:
# Create topic with multiple partitions
kafka-topics --create \
  --topic workflow-events \
  --partitions 10 \
  --replication-factor 1
Partition count should match or exceed the number of consumer replicas for optimal parallelism.

Performance Tuning

HTTP Server Configuration

Optimized for production workloads:
type Config struct {
    Host              string
    Port              int
    RequestTimeout    time.Duration
    ReadTimeout       time.Duration
    ReadHeaderTimeout time.Duration
    WriteTimeout      time.Duration
    IdleTimeout       time.Duration
    ValidationConfig  *ValidationConfig
    HostURL           string
    AllowedOrigins    []string
    SameSiteMode      string
}
Source: internal/server/server.go:65 Recommended Values:
  • ReadTimeout: 30s
  • ReadHeaderTimeout: 10s
  • WriteTimeout: 60s
  • IdleTimeout: 120s

gRPC Request Timeout

type Grpc struct {
    Host           string        `envconfig:"GRPC_HOST" default:"localhost"`
    Port           int           `envconfig:"GRPC_PORT" required:"true"`
    RequestTimeout time.Duration `envconfig:"GRPC_REQUEST_TIMEOUT" default:"500ms"`
}
Source: internal/config/config.go:95 Production Recommendation:
  • GRPC_REQUEST_TIMEOUT: 2s-5s for most operations

Compression

HTTP responses use gzip compression:
srv.httpServer.Handler = srv.withOtelMiddleware(
    srv.withCORSMiddleware(
        srv.withCompressionMiddleware(router),
    ),
)
Source: internal/server/server.go:144

Scaling Strategies

1

Scale Worker Replicas

Increase replicas based on workload:
execution-worker:
  deploy:
    replicas: 5  # Increase from 2 to 5
2

Increase Resource Limits

Adjust CPU and memory based on metrics:
high-resources-workers-limit:
  deploy:
    resources:
      limits:
        cpus: "4"      # Increase from 2 to 4
        memory: 8G     # Increase from 4G to 8G
3

Scale Database Connections

Increase connection pools:
POSTGRES_MAX_CONNS=50
REDIS_POOL_SIZE=50
CLICKHOUSE_MAX_OPEN_CONNS=20
4

Add Kafka Partitions

Increase topic partitions for parallelism:
kafka-topics --alter \
  --topic workflow-events \
  --partitions 20
5

Scale Database Instances

For very high loads, run multiple database instances with read replicas:
postgres-replica:
  image: postgres:18.0-alpine3.22
  environment:
    POSTGRES_MASTER_HOST: postgres
    # Configure as read replica

Load Balancing

Nginx handles load balancing for the HTTP server:
http {
    server {
        listen 80;

        location /api/ {
            proxy_pass http://server:8080/;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # Standard proxy settings with buffering enabled
            proxy_buffering on;
            proxy_cache off;

            # Standard API timeouts
            proxy_read_timeout 60s;
            proxy_send_timeout 60s;
            proxy_connect_timeout 10s;
        }
    }
}
Source: compose.prod.yaml:1343

Monitoring Scaling Metrics

Key Metrics to Monitor

CPU Utilization:
rate(container_cpu_usage_seconds_total{container="execution-worker"}[5m])
Memory Usage:
container_memory_usage_bytes{container="execution-worker"}
Connection Pool Saturation:
rate(postgres_connections_active[5m]) / postgres_connections_max
Kafka Consumer Lag:
kafka_consumer_lag{group="workflow-worker"}

Scaling Triggers

Scale Worker Replicas When:
  • CPU utilization > 70% sustained
  • Kafka consumer lag > 1000 messages
  • Job queue depth > 100
Scale Database Resources When:
  • Connection pool saturation > 80%
  • Query latency p99 > 100ms
  • Disk I/O wait > 20%
Scale Redis When:
  • Memory usage > 80%
  • Evictions > 100/sec
  • Connection pool saturation > 80%

Auto-Scaling (Kubernetes)

For Kubernetes deployments, use Horizontal Pod Autoscaler:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: execution-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: execution-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
Test scaling configurations in staging before applying to production. Monitor for resource contention and database connection exhaustion.

Build docs developers (and LLMs) love