Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Arvo-AI/aurora/llms.txt

Use this file to discover all available pages before exploring further.

Production Deployment Best Practices

Guidelines for deploying Aurora in production environments with security, reliability, and scalability.

Security

Secrets Management

Critical Security Requirements:
  1. Never commit secrets to version control
  2. Use strong, randomly generated passwords
  3. Rotate credentials regularly
  4. Use managed secrets services when available

Generate Strong Secrets

# Generate random secrets (32-byte base64)
openssl rand -base64 32

# Generate for all required secrets:
# - POSTGRES_PASSWORD
# - FLASK_SECRET_KEY
# - AUTH_SECRET
# - SEARXNG_SECRET
# - VAULT_TOKEN (from vault init)

Kubernetes Secrets

For Kubernetes deployments, consider using: External Secrets Operator:
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: aurora-secrets
spec:
  secretStoreRef:
    name: aws-secrets-manager
  target:
    name: aurora-app-secrets
  data:
    - secretKey: FLASK_SECRET_KEY
      remoteRef:
        key: aurora/flask-secret
Sealed Secrets:
# Encrypt secrets for git
kubeseal --format yaml < secret.yaml > sealed-secret.yaml
git add sealed-secret.yaml

Docker Compose Secrets

For Docker Compose, use .env file with restricted permissions:
chmod 600 .env
chown root:root .env  # Or service account user
Or use Docker secrets:
secrets:
  postgres_password:
    file: ./secrets/postgres_password.txt

services:
  postgres:
    secrets:
      - postgres_password
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password

Vault Configuration

Auto-Unseal with Cloud KMS

For production, configure Vault auto-unseal: AWS KMS:
vault:
  seal:
    type: "awskms"
    awskms:
      region: "us-east-1"
      kms_key_id: "alias/aurora-vault-unseal"
GCP Cloud KMS:
vault:
  seal:
    type: "gcpckms"
    gcpckms:
      project: "your-project-id"
      region: "us-central1"
      key_ring: "vault-keyring"
      crypto_key: "vault-unseal-key"

Vault High Availability

For HA Vault:
replicaCounts:
  vault: 3

vault:
  ha:
    enabled: true
    raft:
      enabled: true

Network Security

Kubernetes NetworkPolicies

Restrict pod-to-pod communication:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: aurora-server-policy
  namespace: aurora
spec:
  podSelector:
    matchLabels:
      app: aurora-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: aurora-frontend
      ports:
        - protocol: TCP
          port: 5080
  egress:
    # Allow DNS
    - to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
      ports:
        - protocol: UDP
          port: 53
    # Allow database
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432

Pod Isolation for Untrusted Code

Enable pod isolation for terminal commands:
config:
  ENABLE_POD_ISOLATION: "true"
  TERMINAL_NAMESPACE: "untrusted"
  TERMINAL_RUNTIME_CLASS: "gvisor"  # Sandbox runtime
The chart creates NetworkPolicies that:
  • Block terminal pods from accessing cluster services (Vault, DB, etc.)
  • Allow internet access for cloud API calls
  • Isolate untrusted workloads

TLS/HTTPS Configuration

Ingress TLS with cert-manager

ingress:
  enabled: true
  tls:
    enabled: true
    certManager:
      enabled: true
      issuer: "letsencrypt-prod"
      email: "admin@example.com"
  
  hosts:
    frontend: "aurora.example.com"
    api: "api.aurora.example.com"
    ws: "ws.aurora.example.com"
Install cert-manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# Create ClusterIssuer
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx
EOF

Internal TLS (Service Mesh)

For encrypted internal traffic, use a service mesh: Istio:
istioctl install --set profile=default
kubectl label namespace aurora istio-injection=enabled
Linkerd:
linkerd install | kubectl apply -f -
kubectl annotate namespace aurora linkerd.io/inject=enabled

Access Control

Kubernetes RBAC

Limit who can access Aurora resources:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: aurora-admin
  namespace: aurora
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["*"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: aurora-admin-binding
  namespace: aurora
subjects:
  - kind: User
    name: admin@example.com
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: aurora-admin
  apiGroup: rbac.authorization.k8s.io

Rate Limiting

Enable API rate limiting:
config:
  RATE_LIMITING_ENABLED: "true"
  RATE_LIMIT_HEADERS_ENABLED: "true"

secrets:
  app:
    RATE_LIMIT_BYPASS_TOKEN: "<secure-token-for-automation>"

Reliability

High Availability

Replica Configuration

replicaCounts:
  # Scalable services (3+ for HA)
  server: 3
  celeryWorker: 5
  chatbot: 2
  frontend: 2
  
  # Single instance (requires additional config for HA)
  celeryBeat: 1  # DO NOT scale (causes duplicate tasks)
  postgres: 1    # Use managed DB (RDS, Cloud SQL) for HA
  redis: 1       # Use managed Redis (ElastiCache) for HA
  vault: 1       # Configure Raft storage for HA

Pod Disruption Budgets

Prevent simultaneous pod evictions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: aurora-server-pdb
  namespace: aurora
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: aurora-server

Health Checks

Ensure proper health check configuration:
# Kubernetes
livenessProbe:
  httpGet:
    path: /health
    port: 5080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 5080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

Resource Management

Resource Requests and Limits

Set appropriate resource limits:
resources:
  server:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  
  celeryWorker:
    requests:
      cpu: "200m"
      memory: "2Gi"
    limits:
      cpu: "1000m"
      memory: "8Gi"
  
  postgres:
    requests:
      cpu: "1000m"
      memory: "2Gi"
    limits:
      cpu: "4000m"
      memory: "8Gi"

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-server-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-server
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Backup and Recovery

PostgreSQL Backups

Automated backups with CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: aurora
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: postgres:15-alpine
              env:
                - name: PGHOST
                  value: aurora-oss-postgres
                - name: PGUSER
                  value: aurora
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: aurora-db-secret
                      key: POSTGRES_PASSWORD
              command:
                - /bin/sh
                - -c
                - |
                  pg_dump -Fc aurora_db > /backup/aurora_$(date +%Y%m%d_%H%M%S).dump
                  aws s3 cp /backup/*.dump s3://aurora-backups/postgres/
              volumeMounts:
                - name: backup
                  mountPath: /backup
          volumes:
            - name: backup
              emptyDir: {}
          restartPolicy: OnFailure
Managed Database Backups: Use cloud provider automated backups:
  • AWS RDS: Automated snapshots, point-in-time recovery
  • GCP Cloud SQL: Automated backups, replicas
  • Azure Database: Geo-redundant backups

Volume Snapshots

# Create VolumeSnapshot
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snapshot-$(date +%Y%m%d)
  namespace: aurora
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-aurora-oss-postgres-0
EOF

Disaster Recovery Plan

  1. Regular backups: Daily PostgreSQL dumps, hourly volume snapshots
  2. Multi-region replication: Replicate backups to separate region
  3. Test restores: Monthly restore tests to staging environment
  4. Documentation: Maintain runbook for recovery procedures
  5. Monitoring: Alert on backup failures

Monitoring and Observability

Prometheus Metrics

Enable Prometheus monitoring:
config:
  OTEL_SERVICE_NAME: "aurora-production"
  OTEL_EXPORTER_OTLP_ENDPOINT: "http://prometheus:9090"

Logging

Centralized logging with ELK or Loki:
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: aurora
data:
  fluent-bit.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/containers/aurora-*.log
        Parser            docker
        Tag               aurora.*
    
    [OUTPUT]
        Name              es
        Match             aurora.*
        Host              elasticsearch.logging.svc.cluster.local
        Port              9200
        Index             aurora
        Type              _doc

Alerting

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: aurora-alerts
  namespace: aurora
spec:
  groups:
    - name: aurora
      interval: 30s
      rules:
        - alert: AuroraPodDown
          expr: up{job="aurora-server"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Aurora server pod is down"
        
        - alert: HighMemoryUsage
          expr: container_memory_usage_bytes{pod=~"aurora-.*"} / container_spec_memory_limit_bytes > 0.9
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is using > 90% memory"

Operations

Deployment Strategy

Rolling Updates

apiVersion: apps/v1
kind: Deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Blue-Green Deployment

# Deploy new version to separate namespace
helm install aurora-v2 ./deploy/helm/aurora \
  --namespace aurora-v2 --create-namespace \
  -f values.generated.yaml

# Switch traffic via ingress
kubectl patch ingress aurora-oss -n aurora -p '{"spec":{"rules":[{"host":"api.aurora.example.com","http":{"paths":[{"path":"/","pathType":"Prefix","backend":{"service":{"name":"aurora-v2-server","port":{"number":5080}}}}]}}]}}'

# Cleanup old version
helm uninstall aurora-oss -n aurora

Maintenance Windows

Database Migrations

# Run migrations before deployment
kubectl exec -it deployment/aurora-oss-server -n aurora -- \
  python -m flask db upgrade

# Verify schema version
kubectl exec -it statefulset/aurora-oss-postgres -n aurora -- \
  psql -U aurora -d aurora_db -c "SELECT version_num FROM alembic_version;"

Scaling Down for Maintenance

# Scale to 0
kubectl scale deployment aurora-oss-server --replicas=0 -n aurora

# Perform maintenance
# ...

# Scale back up
kubectl scale deployment aurora-oss-server --replicas=3 -n aurora

Cost Optimization

Use Managed Services

Replace in-cluster stateful services with managed alternatives:
  • Database: RDS, Cloud SQL, Azure Database (automated backups, HA)
  • Redis: ElastiCache, Memorystore, Azure Cache (managed persistence)
  • Object Storage: S3, GCS, Azure Blob (eliminate SeaweedFS)
  • Secrets: AWS Secrets Manager, GCP Secret Manager, Azure Key Vault

Resource Right-Sizing

Monitor actual usage and adjust:
# Check resource usage
kubectl top pods -n aurora
kubectl top nodes

# Use VPA recommendations
kubectl get vpa -n aurora

Node Autoscaling

# Cluster Autoscaler (cloud providers)
# Scales nodes based on pending pods

Checklist

Before going to production:
  • All secrets generated with openssl rand -base64 32
  • Vault configured with auto-unseal (cloud KMS)
  • TLS/HTTPS enabled with valid certificates
  • External object storage configured (S3, GCS, etc.)
  • Database backups configured and tested
  • Monitoring and alerting set up
  • Resource requests and limits configured
  • Replica counts set for HA (3+ for critical services)
  • NetworkPolicies applied
  • Pod isolation enabled (ENABLE_POD_ISOLATION=true)
  • Disaster recovery plan documented
  • Runbooks created for common operations
  • Rate limiting enabled
  • RBAC configured for team access
  • Log aggregation configured
  • Load testing performed

Next Steps

Scaling Guide

Scale Aurora for growing workloads

Monitoring Setup

Set up comprehensive monitoring

Backup & Recovery

Implement backup strategies

Troubleshooting

Common issues and solutions

Build docs developers (and LLMs) love