Production Deployment Best Practices

Guidelines for deploying Aurora in production environments with security, reliability, and scalability.

Security

Secrets Management

Critical Security Requirements:

Never commit secrets to version control
Use strong, randomly generated passwords
Rotate credentials regularly
Use managed secrets services when available

Generate Strong Secrets

# Generate random secrets (32-byte base64)
openssl rand -base64 32

# Generate for all required secrets:
# - POSTGRES_PASSWORD
# - FLASK_SECRET_KEY
# - AUTH_SECRET
# - SEARXNG_SECRET
# - VAULT_TOKEN (from vault init)

Kubernetes Secrets

For Kubernetes deployments, consider using: External Secrets Operator:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: aurora-secrets
spec:
  secretStoreRef:
    name: aws-secrets-manager
  target:
    name: aurora-app-secrets
  data:
    - secretKey: FLASK_SECRET_KEY
      remoteRef:
        key: aurora/flask-secret

Sealed Secrets:

# Encrypt secrets for git
kubeseal --format yaml < secret.yaml > sealed-secret.yaml
git add sealed-secret.yaml

Docker Compose Secrets

For Docker Compose, use .env file with restricted permissions:

chmod 600 .env
chown root:root .env  # Or service account user

Or use Docker secrets:

secrets:
  postgres_password:
    file: ./secrets/postgres_password.txt

services:
  postgres:
    secrets:
      - postgres_password
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password

Vault Configuration

Auto-Unseal with Cloud KMS

For production, configure Vault auto-unseal: AWS KMS:

vault:
  seal:
    type: "awskms"
    awskms:
      region: "us-east-1"
      kms_key_id: "alias/aurora-vault-unseal"

GCP Cloud KMS:

vault:
  seal:
    type: "gcpckms"
    gcpckms:
      project: "your-project-id"
      region: "us-central1"
      key_ring: "vault-keyring"
      crypto_key: "vault-unseal-key"

Vault High Availability

For HA Vault:

replicaCounts:
  vault: 3

vault:
  ha:
    enabled: true
    raft:
      enabled: true

Network Security

Kubernetes NetworkPolicies

Restrict pod-to-pod communication:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: aurora-server-policy
  namespace: aurora
spec:
  podSelector:
    matchLabels:
      app: aurora-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: aurora-frontend
      ports:
        - protocol: TCP
          port: 5080
  egress:
    # Allow DNS
    - to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
      ports:
        - protocol: UDP
          port: 53
    # Allow database
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432

Pod Isolation for Untrusted Code

Enable pod isolation for terminal commands:

config:
  ENABLE_POD_ISOLATION: "true"
  TERMINAL_NAMESPACE: "untrusted"
  TERMINAL_RUNTIME_CLASS: "gvisor"  # Sandbox runtime

The chart creates NetworkPolicies that:

Block terminal pods from accessing cluster services (Vault, DB, etc.)
Allow internet access for cloud API calls
Isolate untrusted workloads

TLS/HTTPS Configuration

Ingress TLS with cert-manager

ingress:
  enabled: true
  tls:
    enabled: true
    certManager:
      enabled: true
      issuer: "letsencrypt-prod"
      email: "admin@example.com"
  
  hosts:
    frontend: "aurora.example.com"
    api: "api.aurora.example.com"
    ws: "ws.aurora.example.com"

Install cert-manager:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# Create ClusterIssuer
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx
EOF

Internal TLS (Service Mesh)

For encrypted internal traffic, use a service mesh: Istio:

istioctl install --set profile=default
kubectl label namespace aurora istio-injection=enabled

Linkerd:

linkerd install | kubectl apply -f -
kubectl annotate namespace aurora linkerd.io/inject=enabled

Access Control

Kubernetes RBAC

Limit who can access Aurora resources:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: aurora-admin
  namespace: aurora
rules:
  - apiGroups: ["", "apps", "batch"]
    resources: ["*"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: aurora-admin-binding
  namespace: aurora
subjects:
  - kind: User
    name: admin@example.com
    apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: aurora-admin
  apiGroup: rbac.authorization.k8s.io

Rate Limiting

Enable API rate limiting:

config:
  RATE_LIMITING_ENABLED: "true"
  RATE_LIMIT_HEADERS_ENABLED: "true"

secrets:
  app:
    RATE_LIMIT_BYPASS_TOKEN: "<secure-token-for-automation>"

Reliability

High Availability

Replica Configuration

replicaCounts:
  # Scalable services (3+ for HA)
  server: 3
  celeryWorker: 5
  chatbot: 2
  frontend: 2
  
  # Single instance (requires additional config for HA)
  celeryBeat: 1  # DO NOT scale (causes duplicate tasks)
  postgres: 1    # Use managed DB (RDS, Cloud SQL) for HA
  redis: 1       # Use managed Redis (ElastiCache) for HA
  vault: 1       # Configure Raft storage for HA

Pod Disruption Budgets

Prevent simultaneous pod evictions:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: aurora-server-pdb
  namespace: aurora
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: aurora-server

Health Checks

Ensure proper health check configuration:

# Kubernetes
livenessProbe:
  httpGet:
    path: /health
    port: 5080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 5080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

Resource Management

Resource Requests and Limits

Set appropriate resource limits:

resources:
  server:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  
  celeryWorker:
    requests:
      cpu: "200m"
      memory: "2Gi"
    limits:
      cpu: "1000m"
      memory: "8Gi"
  
  postgres:
    requests:
      cpu: "1000m"
      memory: "2Gi"
    limits:
      cpu: "4000m"
      memory: "8Gi"

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: aurora-server-hpa
  namespace: aurora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: aurora-oss-server
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Backup and Recovery

PostgreSQL Backups

Automated backups with CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: aurora
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: postgres:15-alpine
              env:
                - name: PGHOST
                  value: aurora-oss-postgres
                - name: PGUSER
                  value: aurora
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: aurora-db-secret
                      key: POSTGRES_PASSWORD
              command:
                - /bin/sh
                - -c
                - |
                  pg_dump -Fc aurora_db > /backup/aurora_$(date +%Y%m%d_%H%M%S).dump
                  aws s3 cp /backup/*.dump s3://aurora-backups/postgres/
              volumeMounts:
                - name: backup
                  mountPath: /backup
          volumes:
            - name: backup
              emptyDir: {}
          restartPolicy: OnFailure

Managed Database Backups: Use cloud provider automated backups:

AWS RDS: Automated snapshots, point-in-time recovery
GCP Cloud SQL: Automated backups, replicas
Azure Database: Geo-redundant backups

Volume Snapshots

# Create VolumeSnapshot
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snapshot-$(date +%Y%m%d)
  namespace: aurora
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-aurora-oss-postgres-0
EOF

Disaster Recovery Plan

Regular backups: Daily PostgreSQL dumps, hourly volume snapshots
Multi-region replication: Replicate backups to separate region
Test restores: Monthly restore tests to staging environment
Documentation: Maintain runbook for recovery procedures
Monitoring: Alert on backup failures

Monitoring and Observability

Prometheus Metrics

Enable Prometheus monitoring:

config:
  OTEL_SERVICE_NAME: "aurora-production"
  OTEL_EXPORTER_OTLP_ENDPOINT: "http://prometheus:9090"

Logging

Centralized logging with ELK or Loki:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: aurora
data:
  fluent-bit.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/containers/aurora-*.log
        Parser            docker
        Tag               aurora.*
    
    [OUTPUT]
        Name              es
        Match             aurora.*
        Host              elasticsearch.logging.svc.cluster.local
        Port              9200
        Index             aurora
        Type              _doc

Alerting

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: aurora-alerts
  namespace: aurora
spec:
  groups:
    - name: aurora
      interval: 30s
      rules:
        - alert: AuroraPodDown
          expr: up{job="aurora-server"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Aurora server pod is down"
        
        - alert: HighMemoryUsage
          expr: container_memory_usage_bytes{pod=~"aurora-.*"} / container_spec_memory_limit_bytes > 0.9
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is using > 90% memory"

Operations

Deployment Strategy

Rolling Updates

apiVersion: apps/v1
kind: Deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Blue-Green Deployment

# Deploy new version to separate namespace
helm install aurora-v2 ./deploy/helm/aurora \
  --namespace aurora-v2 --create-namespace \
  -f values.generated.yaml

# Switch traffic via ingress
kubectl patch ingress aurora-oss -n aurora -p '{"spec":{"rules":[{"host":"api.aurora.example.com","http":{"paths":[{"path":"/","pathType":"Prefix","backend":{"service":{"name":"aurora-v2-server","port":{"number":5080}}}}]}}]}}'

# Cleanup old version
helm uninstall aurora-oss -n aurora

Maintenance Windows

Database Migrations

# Run migrations before deployment
kubectl exec -it deployment/aurora-oss-server -n aurora -- \
  python -m flask db upgrade

# Verify schema version
kubectl exec -it statefulset/aurora-oss-postgres -n aurora -- \
  psql -U aurora -d aurora_db -c "SELECT version_num FROM alembic_version;"

Scaling Down for Maintenance

# Scale to 0
kubectl scale deployment aurora-oss-server --replicas=0 -n aurora

# Perform maintenance
# ...

# Scale back up
kubectl scale deployment aurora-oss-server --replicas=3 -n aurora

Cost Optimization

Use Managed Services

Replace in-cluster stateful services with managed alternatives:

Database: RDS, Cloud SQL, Azure Database (automated backups, HA)
Redis: ElastiCache, Memorystore, Azure Cache (managed persistence)
Object Storage: S3, GCS, Azure Blob (eliminate SeaweedFS)
Secrets: AWS Secrets Manager, GCP Secret Manager, Azure Key Vault

Resource Right-Sizing

Monitor actual usage and adjust:

# Check resource usage
kubectl top pods -n aurora
kubectl top nodes

# Use VPA recommendations
kubectl get vpa -n aurora

Node Autoscaling

# Cluster Autoscaler (cloud providers)
# Scales nodes based on pending pods

Checklist

Before going to production:

Next Steps

Scaling Guide

Scale Aurora for growing workloads

Monitoring Setup

Set up comprehensive monitoring

Backup & Recovery

Implement backup strategies

Troubleshooting

Common issues and solutions

Get Started

Core Features

Architecture

Deployment

Configuration

Integrations

Cloud Providers

Observability

Development

Guides

Reference

Help

Documentation Index

​Production Deployment Best Practices

​Security

​Secrets Management

​Generate Strong Secrets

​Kubernetes Secrets

​Docker Compose Secrets

​Vault Configuration

​Auto-Unseal with Cloud KMS

​Vault High Availability

​Network Security

​Kubernetes NetworkPolicies

​Pod Isolation for Untrusted Code

​TLS/HTTPS Configuration

​Ingress TLS with cert-manager

​Internal TLS (Service Mesh)

​Access Control

​Kubernetes RBAC

​Rate Limiting

​Reliability

​High Availability

​Replica Configuration

​Pod Disruption Budgets

​Health Checks

​Resource Management

​Resource Requests and Limits

​Horizontal Pod Autoscaling

​Backup and Recovery

​PostgreSQL Backups

​Volume Snapshots

​Disaster Recovery Plan

​Monitoring and Observability

​Prometheus Metrics

​Logging

​Alerting

​Operations

​Deployment Strategy

​Rolling Updates

​Blue-Green Deployment

​Maintenance Windows

​Database Migrations

​Scaling Down for Maintenance

​Cost Optimization

​Use Managed Services

​Resource Right-Sizing

​Node Autoscaling

​Checklist

​Next Steps

Scaling Guide

Monitoring Setup

Backup & Recovery

Troubleshooting

Build docs developers (and LLMs) love

Production Deployment Best Practices

Security

Secrets Management

Generate Strong Secrets

Kubernetes Secrets

Docker Compose Secrets

Vault Configuration

Auto-Unseal with Cloud KMS

Vault High Availability

Network Security

Kubernetes NetworkPolicies

Pod Isolation for Untrusted Code

TLS/HTTPS Configuration

Ingress TLS with cert-manager

Internal TLS (Service Mesh)

Access Control

Kubernetes RBAC

Rate Limiting

Reliability

High Availability

Replica Configuration

Pod Disruption Budgets

Health Checks

Resource Management

Resource Requests and Limits

Horizontal Pod Autoscaling

Backup and Recovery

PostgreSQL Backups

Volume Snapshots

Disaster Recovery Plan

Monitoring and Observability

Prometheus Metrics

Logging

Alerting

Operations

Deployment Strategy

Rolling Updates

Blue-Green Deployment

Maintenance Windows

Database Migrations

Scaling Down for Maintenance

Cost Optimization

Use Managed Services

Resource Right-Sizing

Node Autoscaling

Checklist

Next Steps