BioAgents workers can scale horizontally across multiple servers with no coordination required. All workers connect to the same Redis queue and automatically share the workload.
Scaling Architecture
Multi-Server Deployment
Server Roles
API Servers Handle HTTP/WebSocket requests, enqueue jobs, broadcast notifications
Worker Servers Process jobs from queue, execute AI workflows, update database
Redis Central message broker for job queue and pub/sub
Setup Strategy
Deploy Redis - Use managed service (Upstash, ElastiCache) for high availability
Deploy API servers - Scale based on HTTP traffic and WebSocket connections
Deploy workers - Scale based on queue depth and job processing needs
Worker Deployment
Prerequisites
Each worker server needs:
Docker 20.10+
Access to Redis (via REDIS_URL)
Access to Supabase database
LLM API keys (OpenAI, Anthropic, etc.)
Deploy to New Server
Install Docker
curl -fsSL https://get.docker.com | sh
Clone Repository
git clone https://github.com/bio-xyz/bioagents-agentkit.git
cd bioagents-agentkit
Configure Environment
cp .env.worker.example .env
nano .env
Required variables: # External Redis (shared across all workers)
REDIS_URL = rediss://default:password@your-redis.upstash.io:6379
# Database
SUPABASE_URL = https://your-project.supabase.co
SUPABASE_ANON_KEY = eyJ...
# LLM API Keys
OPENAI_API_KEY = sk-...
ANTHROPIC_API_KEY = sk-ant-...
GOOGLE_API_KEY = AIza...
Start Workers
# Start 2 worker containers
docker-compose -f docker-compose.worker.yml up -d --scale worker= 2
# Verify workers are running
docker-compose -f docker-compose.worker.yml ps
Monitor Logs
docker-compose -f docker-compose.worker.yml logs -f
Look for: redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized
Worker Configuration
docker-compose.worker.yml
Deploy Script
services :
worker :
build : .
command : [ "bun" , "run" , "src/worker.ts" ]
environment :
# Enable queue mode
- USE_JOB_QUEUE=true
# External Redis
- REDIS_URL=${REDIS_URL}
# Database
- SUPABASE_URL=${SUPABASE_URL}
- SUPABASE_ANON_KEY=${SUPABASE_ANON_KEY}
# LLM API Keys
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
# Worker concurrency
- CHAT_QUEUE_CONCURRENCY=${CHAT_QUEUE_CONCURRENCY:-5}
- DEEP_RESEARCH_QUEUE_CONCURRENCY=${DEEP_RESEARCH_QUEUE_CONCURRENCY:-3}
# Production
- NODE_ENV=production
restart : unless-stopped
# Allow long-running jobs to complete
stop_grace_period : 8h
# Resource limits
deploy :
resources :
limits :
memory : 2G
reservations :
memory : 512M
Scaling Strategies
Scale Based on Queue Depth
Monitor queue depth and scale workers accordingly:
# Check waiting jobs
redis-cli -u $REDIS_URL LLEN bull:deep-research:waiting
redis-cli -u $REDIS_URL LLEN bull:chat:waiting
Scaling guidelines:
Queue Depth Recommended Workers Response Time 0-10 jobs 2 workers < 5 minutes 10-30 jobs 4 workers < 10 minutes 30-50 jobs 6 workers < 15 minutes 50+ jobs 8+ workers < 20 minutes
Each deep research worker can handle ~3 concurrent jobs. Each chat worker can handle ~5 concurrent jobs.
Auto-Scaling with Monitoring
Implement auto-scaling based on queue metrics:
Auto-Scale Script
Kubernetes HPA
import redis
import subprocess
import time
REDIS_URL = "redis://your-redis-host:6379"
MIN_WORKERS = 2
MAX_WORKERS = 10
SCALE_UP_THRESHOLD = 20
SCALE_DOWN_THRESHOLD = 5
def get_queue_depth ():
r = redis.from_url( REDIS_URL )
chat_waiting = r.llen( "bull:chat:waiting" )
research_waiting = r.llen( "bull:deep-research:waiting" )
return chat_waiting + research_waiting
def get_current_workers ():
result = subprocess.run(
[ "docker-compose" , "-f" , "docker-compose.worker.yml" , "ps" , "-q" ],
capture_output = True ,
text = True
)
return len (result.stdout.strip().split( " \n " ))
def scale_workers ( count ):
count = max ( MIN_WORKERS , min ( MAX_WORKERS , count))
subprocess.run([
"docker-compose" , "-f" , "docker-compose.worker.yml" ,
"up" , "-d" , "--scale" , f "worker= { count } "
])
print ( f "Scaled to { count } workers" )
while True :
depth = get_queue_depth()
current = get_current_workers()
if depth > SCALE_UP_THRESHOLD :
scale_workers(current + 2 )
elif depth < SCALE_DOWN_THRESHOLD and current > MIN_WORKERS :
scale_workers(current - 1 )
time.sleep( 60 ) # Check every minute
Concurrency Tuning
Adjust concurrency per worker based on server resources:
Low Memory (1GB)
Medium Memory (2GB)
High Memory (4GB+)
CHAT_QUEUE_CONCURRENCY = 2
DEEP_RESEARCH_QUEUE_CONCURRENCY = 1
2 chat jobs + 1 research job = ~1.5GB peak memory
Conservative but reliable
CHAT_QUEUE_CONCURRENCY = 5
DEEP_RESEARCH_QUEUE_CONCURRENCY = 3
Default configuration
Balanced throughput and stability
CHAT_QUEUE_CONCURRENCY = 10
DEEP_RESEARCH_QUEUE_CONCURRENCY = 5
Maximum throughput
Requires monitoring to prevent OOM
Multi-Region Deployment
Deploy workers in multiple regions for global coverage:
Multi-region deployments require:
Low-latency Redis (use Upstash Global or regional replicas)
Database replication or read replicas
Careful handling of cross-region network latency
Resource Planning
Worker Server Sizing
Hetzner Cloud
DigitalOcean
AWS EC2
Plan vCPU RAM Workers Cost/mo CX22 2 4GB 2 $6 CX32 4 8GB 4 $12 CX42 8 16GB 8 $24 CX52 16 32GB 16 $48
Droplet vCPU RAM Workers Cost/mo Basic 2 4GB 2 $24 General Purpose 4 8GB 4 $48 CPU-Optimized 8 16GB 8 $96
Instance vCPU RAM Workers Cost/mo t3.medium 2 4GB 2 $30 t3.large 2 8GB 4 $60 c6i.xlarge 4 8GB 4 $122 c6i.2xlarge 8 16GB 8 $244
Cost Optimization
Use spot instances for burst capacity: # AWS EC2 Spot
aws ec2 run-instances \
--instance-type c6i.xlarge \
--spot-instance-request-type one-time \
--user-data file://worker-init.sh
Benefits:
60-90% cost savings
Good for non-critical workers
Risks:
Can be terminated with 2-minute notice
Workers should handle graceful shutdown
Reserve minimum capacity for predictable workloads:
1-year commitment: ~30% savings
3-year commitment: ~50% savings
Strategy:
Reserve minimum worker capacity (e.g., 2 workers)
Use on-demand/spot for scaling above baseline
Scale workers based on time of day: # Cron job: Scale up during business hours
0 9 * * 1-5 /opt/bioagents/scale-workers.sh 8
# Scale down at night
0 18 * * 1-5 /opt/bioagents/scale-workers.sh 2
High Availability
Worker Redundancy
Always run at least 2 workers to prevent single point of failure:
# Minimum HA setup
docker-compose -f docker-compose.worker.yml up -d --scale worker= 2
If one worker crashes, the other continues processing jobs. BullMQ automatically reassigns stalled jobs.
Graceful Shutdown
Workers use stop_grace_period: 8h to finish long-running jobs:
services :
worker :
stop_grace_period : 8h # Allow deep research jobs to complete
Shutdown behavior:
Docker sends SIGTERM to worker
Worker stops accepting new jobs
Worker continues processing active jobs
After 8 hours, Docker sends SIGKILL (force stop)
Never use docker-compose down without checking for active jobs. Use Bull Board to verify queue is empty first.
Redis Failover
Use managed Redis with automatic failover:
Upstash
AWS ElastiCache
Redis Sentinel
REDIS_URL = rediss://default:password@your-redis.upstash.io:6379
Features:
Multi-region replication
Automatic failover
TLS encryption
Pay-per-use pricing
REDIS_URL = redis://master.cluster.abc123.use1.cache.amazonaws.com:6379
Features:
Automatic failover with Redis Cluster
Multi-AZ deployment
Automated backups
CloudWatch monitoring
REDIS_URL = redis://sentinel-1:26379,sentinel-2:26379,sentinel-3:26379
Features:
Self-hosted HA solution
Automatic failover
Lower cost than managed services
Requires more maintenance
Monitoring & Observability
Queue Metrics
Export queue metrics to monitoring systems:
Prometheus Exporter
Grafana Dashboard
const client = require ( 'prom-client' );
const { getChatQueue , getDeepResearchQueue } = require ( './queue/queues' );
const queueDepthGauge = new client . Gauge ({
name: 'bioagents_queue_depth' ,
help: 'Number of jobs waiting in queue' ,
labelNames: [ 'queue' , 'state' ]
});
async function updateMetrics () {
const chatQueue = getChatQueue ();
const researchQueue = getDeepResearchQueue ();
const chatCounts = await chatQueue . getJobCounts ();
const researchCounts = await researchQueue . getJobCounts ();
queueDepthGauge . set ({ queue: 'chat' , state: 'waiting' }, chatCounts . waiting );
queueDepthGauge . set ({ queue: 'chat' , state: 'active' }, chatCounts . active );
queueDepthGauge . set ({ queue: 'deep-research' , state: 'waiting' }, researchCounts . waiting );
queueDepthGauge . set ({ queue: 'deep-research' , state: 'active' }, researchCounts . active );
}
setInterval ( updateMetrics , 10000 ); // Every 10 seconds
Alerting
Set up alerts for queue health:
alerts :
- name : HighQueueDepth
expr : bioagents_queue_depth{state="waiting"} > 50
for : 10m
annotations :
summary : "Queue depth is high"
description : "{{ $labels.queue }} has {{ $value }} waiting jobs"
- name : NoActiveWorkers
expr : count(up{job="bioagents-worker"}) == 0
for : 1m
annotations :
summary : "No workers are running"
description : "All workers are down - jobs will not be processed"
- name : HighJobFailureRate
expr : rate(bioagents_job_failures_total[5m]) > 0.1
for : 5m
annotations :
summary : "Job failure rate is high"
description : "{{ $value }} jobs/sec are failing"
Troubleshooting
Workers Not Picking Up Jobs
Check Redis connection:
docker-compose -f docker-compose.worker.yml logs | grep -i redis
Expected output:
redis_publisher_connected
chat_queue_initialized
deep_research_queue_initialized
Verify Redis URL:
docker-compose -f docker-compose.worker.yml exec worker env | grep REDIS
Uneven Load Distribution
Symptom: Some workers process many jobs, others idle.
Cause: Different worker start times or concurrency settings.
Fix: Ensure all workers have identical configuration:
# Restart all workers simultaneously
docker-compose -f docker-compose.worker.yml down
docker-compose -f docker-compose.worker.yml up -d --scale worker= 4
Memory Leaks
Monitor memory over time:
Implement periodic restarts:
# Cron job: Rolling restart every 24 hours
0 3 * * * /opt/bioagents/rolling-restart.sh
#!/bin/bash
# Restart workers one at a time to maintain capacity
for i in { 1..4} ; do
echo "Restarting worker $i ..."
docker-compose -f docker-compose.worker.yml restart worker_ $i
sleep 60 # Wait 1 minute between restarts
done
Best Practices
Next Steps
Job Queue Learn about BullMQ architecture and configuration
Docker Setup Deploy with docker-compose