Documentation Index Fetch the complete documentation index at: https://mintlify.com/Arvo-AI/aurora/llms.txt
Use this file to discover all available pages before exploring further.
Scaling Aurora
Guidelines for scaling Aurora to handle increased load, traffic, and concurrent users.
Scaling Strategy
Aurora components fall into three categories:
Stateless Services (Horizontally Scalable)
These services can be scaled by increasing replica count:
aurora-server - REST API (handles HTTP requests)
celery-worker - Background tasks (RCA analysis, integrations)
chatbot - WebSocket server (chat interface)
frontend - Next.js UI (serves web pages)
searxng - Web search engine
t2v-transformers - ML embeddings
Stateful Services (Requires Special Configuration)
These services require additional setup for horizontal scaling:
postgres - Database (replication, read replicas)
redis - Cache and queue (Redis Cluster)
weaviate - Vector database (multi-node cluster)
vault - Secrets management (Raft HA)
Single-Instance Services
These MUST remain at 1 replica:
celery-beat - Task scheduler (multiple instances cause duplicate tasks)
Horizontal Scaling
Kubernetes (Helm)
Increase replica counts in values.generated.yaml:
replicaCounts :
# Scale based on traffic
server : 5 # API requests
celeryWorker : 10 # Background tasks
chatbot : 3 # WebSocket connections
frontend : 3 # Web traffic
# Scale for performance
searxng : 2 # Web search
transformers : 2 # ML embeddings
# Keep at 1
celeryBeat : 1 # DO NOT SCALE
Apply changes:
helm upgrade aurora-oss ./deploy/helm/aurora \
--namespace aurora \
-f values.generated.yaml
Docker Compose
Scale services manually:
# Scale specific service
docker compose up -d --scale celery_worker= 5
# Scale multiple services
docker compose up -d \
--scale celery_worker= 5 \
--scale aurora-server= 3
Or edit docker-compose.yaml:
services :
celery_worker :
# ... existing config ...
deploy :
replicas : 5
Auto-Scaling
Horizontal Pod Autoscaler (HPA)
Automatically scale based on CPU/memory:
Enable metrics server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Create HPA for API server
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : aurora-server-hpa
namespace : aurora
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : aurora-oss-server
minReplicas : 3
maxReplicas : 10
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 70
- type : Resource
resource :
name : memory
target :
type : Utilization
averageUtilization : 80
behavior :
scaleUp :
stabilizationWindowSeconds : 60
policies :
- type : Percent
value : 50
periodSeconds : 60
scaleDown :
stabilizationWindowSeconds : 300
policies :
- type : Pods
value : 1
periodSeconds : 120
Apply: kubectl apply -f aurora-server-hpa.yaml
Create HPA for Celery workers
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : aurora-celery-worker-hpa
namespace : aurora
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : aurora-oss-celery-worker
minReplicas : 2
maxReplicas : 20
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 80
Monitor autoscaling
# Check HPA status
kubectl get hpa -n aurora
# Watch scaling events
kubectl get hpa -n aurora -w
# View scaling events
kubectl describe hpa aurora-server-hpa -n aurora
Custom Metrics Autoscaling
Scale based on application metrics:
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : aurora-celery-queue-hpa
namespace : aurora
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : aurora-oss-celery-worker
minReplicas : 2
maxReplicas : 50
metrics :
# Scale based on Celery queue length
- type : External
external :
metric :
name : redis_celery_queue_length
selector :
matchLabels :
queue : celery
target :
type : AverageValue
averageValue : "10" # 10 tasks per worker
Vertical Scaling
Increase Resource Limits
For Kubernetes, update values.generated.yaml:
resources :
server :
requests :
cpu : "1000m" # Increased from 500m
memory : "2Gi" # Increased from 1Gi
limits :
cpu : "4000m" # Increased from 2000m
memory : "8Gi" # Increased from 4Gi
celeryWorker :
requests :
cpu : "500m" # Increased from 200m
memory : "4Gi" # Increased from 2Gi
limits :
cpu : "2000m" # Increased from 1000m
memory : "16Gi" # Increased from 8Gi
Apply changes:
helm upgrade aurora-oss ./deploy/helm/aurora \
--namespace aurora \
-f values.generated.yaml
Vertical Pod Autoscaler (VPA)
Automatically adjust resource requests:
Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
Create VPA for API server
apiVersion : autoscaling.k8s.io/v1
kind : VerticalPodAutoscaler
metadata :
name : aurora-server-vpa
namespace : aurora
spec :
targetRef :
apiVersion : apps/v1
kind : Deployment
name : aurora-oss-server
updatePolicy :
updateMode : "Auto" # or "Recreate" or "Initial"
resourcePolicy :
containerPolicies :
- containerName : '*'
minAllowed :
cpu : 100m
memory : 256Mi
maxAllowed :
cpu : 4000m
memory : 8Gi
Monitor recommendations
kubectl get vpa aurora-server-vpa -n aurora
kubectl describe vpa aurora-server-vpa -n aurora
Database Scaling
PostgreSQL
Read Replicas
For read-heavy workloads:
Managed Database (Recommended):
AWS RDS: Create read replicas via console
GCP Cloud SQL: Enable read replicas
Azure Database: Add read replicas
Configure application:
config :
POSTGRES_HOST : "aurora-primary.xyz.rds.amazonaws.com"
POSTGRES_READ_REPLICA_HOST : "aurora-replica.xyz.rds.amazonaws.com"
Self-Managed:
# Deploy read replica
apiVersion : apps/v1
kind : StatefulSet
metadata :
name : postgres-replica
spec :
serviceName : postgres-replica
replicas : 2
template :
spec :
containers :
- name : postgres
image : postgres:15-alpine
env :
- name : POSTGRES_PRIMARY_HOST
value : "aurora-oss-postgres-0.aurora-oss-postgres"
- name : POSTGRES_REPLICATION_MODE
value : "slave"
Connection Pooling
Use PgBouncer to reduce database connections:
apiVersion : apps/v1
kind : Deployment
metadata :
name : pgbouncer
namespace : aurora
spec :
replicas : 2
template :
spec :
containers :
- name : pgbouncer
image : edoburu/pgbouncer:latest
env :
- name : DATABASE_URL
value : "postgresql://aurora:password@aurora-oss-postgres:5432/aurora_db"
- name : POOL_MODE
value : "transaction"
- name : MAX_CLIENT_CONN
value : "1000"
- name : DEFAULT_POOL_SIZE
value : "25"
Update application:
config :
POSTGRES_HOST : "pgbouncer"
Redis Scaling
Redis Cluster
For high availability and sharding:
services :
redis :
enabled : false # Disable built-in Redis
config :
REDIS_URL : "redis://redis-cluster:6379/0"
Deploy Redis Cluster:
helm install redis bitnami/redis-cluster \
--namespace aurora \
--set cluster.nodes= 6 \
--set cluster.replicas= 1
Redis Sentinel
For failover without sharding:
helm install redis bitnami/redis \
--namespace aurora \
--set sentinel.enabled= true \
--set master.persistence.size=20Gi \
--set replica.replicaCount= 2
Vector Database Scaling
Weaviate Clustering
For production, use Weaviate Cloud or multi-node cluster:
Weaviate Cloud (Recommended):
services :
weaviate :
enabled : false
config :
WEAVIATE_HOST : "aurora-cluster.weaviate.network"
WEAVIATE_PORT : "443"
WEAVIATE_SCHEME : "https"
Self-Managed Cluster:
replicaCounts :
weaviate : 3
weaviate :
cluster :
enabled : true
replicas : 3
Load Balancing
Ingress Session Affinity
For WebSocket connections, enable session affinity:
ingress :
annotations :
nginx.ingress.kubernetes.io/affinity : "cookie"
nginx.ingress.kubernetes.io/session-cookie-name : "aurora-ws-affinity"
nginx.ingress.kubernetes.io/session-cookie-max-age : "3600"
External Load Balancer
For cloud deployments:
AWS ALB:
ingress :
className : "alb"
annotations :
alb.ingress.kubernetes.io/scheme : internet-facing
alb.ingress.kubernetes.io/target-type : ip
alb.ingress.kubernetes.io/healthcheck-path : /health
GCP Load Balancer:
ingress :
className : "gce"
annotations :
kubernetes.io/ingress.class : "gce"
kubernetes.io/ingress.global-static-ip-name : "aurora-ip"
Monitoring Scaling
Key Metrics to Track
apiVersion : v1
kind : ConfigMap
metadata :
name : prometheus-config
data :
prometheus.yml : |
scrape_configs:
- job_name: 'aurora-metrics'
metrics_path: '/metrics'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- aurora
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: aurora-.*
Metrics to monitor:
Request rate (requests/sec)
Response time (p50, p95, p99)
Error rate (%)
CPU usage (%)
Memory usage (%)
Celery queue length
Database connections
Redis memory usage
Grafana Dashboard
Import Aurora dashboard:
kubectl create configmap grafana-dashboard-aurora \
--from-file=aurora-dashboard.json \
-n monitoring
Caching
Enable aggressive caching:
config :
# Cloud provider API caching
AURORA_SETUP_CACHE_ENABLED : "true"
AURORA_SETUP_CACHE_TTL : "7200" # 2 hours
# Storage caching
STORAGE_CACHE_ENABLED : "true"
STORAGE_CACHE_TTL : "300" # 5 minutes
Cost Optimization
Reduce LLM costs during scale:
config :
RCA_OPTIMIZE_COSTS : "true" # Use cheaper models when possible
AGENT_RECURSION_LIMIT : "120" # Reduce from 240 for faster completion
Testing Scaling
Load Testing
Use k6 for load testing:
// load-test.js
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
export let options = {
stages: [
{ duration: '2m' , target: 10 }, // Ramp up to 10 users
{ duration: '5m' , target: 50 }, // Ramp up to 50 users
{ duration: '10m' , target: 100 }, // Stay at 100 users
{ duration: '2m' , target: 0 }, // Ramp down
],
};
export default function () {
const res = http . get ( 'https://api.aurora.example.com/health' );
check ( res , {
'status is 200' : ( r ) => r . status === 200 ,
'response time < 500ms' : ( r ) => r . timings . duration < 500 ,
});
sleep ( 1 );
}
Run test:
Stress Testing
export let options = {
stages: [
{ duration: '5m' , target: 1000 }, // Ramp to 1000 users
{ duration: '10m' , target: 1000 }, // Stay at peak
],
};
Scaling Checklist
Before scaling to production:
Common Scaling Issues
Pod OOMKilled
Increase memory limits:
resources :
celeryWorker :
limits :
memory : "16Gi" # Increased from 8Gi
Database Connection Exhaustion
Add PgBouncer or increase connection limits:
postgres :
config :
max_connections : "500" # Increased from 100
Redis Memory Issues
Increase Redis memory or add eviction policy:
redis :
config :
maxmemory : "2gb"
maxmemory-policy : "allkeys-lru"
Slow Response Times
Profile application:
# Enable profiling
kubectl exec -it deployment/aurora-oss-server -n aurora -- \
python -m cProfile -s cumtime main_compute.py
Next Steps
Production Best Practices Security and reliability for production
Monitoring Set up comprehensive monitoring
Performance Tuning Optimize Aurora performance
Troubleshooting Common scaling issues