Overview
In a GitOps setup, Git is the source of truth. Most cluster state can be restored by re-applying the Git repository. However, some components require explicit backup:
- Sealed Secrets private key: Required to decrypt SealedSecret resources
- etcd data: Kubernetes cluster state (optional, for rapid recovery)
- Persistent volumes: Application data (application-specific)
GitOps Backup Strategy
Git Repository as Backup
All cluster configuration is stored in Git:
What’s backed up in Git:
- Flux configuration (
cluster/kimawesome/)
- HelmRelease definitions (
overlays/base/*/helm-release.yaml)
- Kustomizations and overlays
- Kubernetes manifests (Deployments, Services, etc.)
- SealedSecret resources (encrypted)
What’s NOT backed up in Git:
- Sealed Secrets private key (stored in cluster)
- Kubernetes Secrets (generated by applications)
- etcd cluster state
- Persistent volume data
Backup Git Repository
Clone repository
git clone ssh://[email protected]/kim-ae/kimbernetes-k8s-flux kimbernetes-backup
cd kimbernetes-backup
Create archive
tar czf kimbernetes-$(date +%Y%m%d).tar.gz kimbernetes-backup/
Store securely
- Upload to S3/cloud storage
- Store on separate physical location
- Encrypt if storing on untrusted storage
GitHub already provides repository backups, but a separate backup protects against account compromise or accidental deletion.
Backing Up Sealed Secrets
Why Backup?
The sealed-secrets controller generates a private key on first installation. This key is required to decrypt all SealedSecret resources. Without this key, you cannot decrypt your secrets.
Backup the Private Key
Export the private key
kubectl get secret -n sealed-secrets \
-l sealedsecrets.bitnami.com/sealed-secrets-key=active \
-o yaml > sealed-secrets-private-key.yaml
Encrypt the backup
# Encrypt with GPG
gpg --symmetric --cipher-algo AES256 sealed-secrets-private-key.yaml
# Or use age
age -e -o sealed-secrets-private-key.yaml.age sealed-secrets-private-key.yaml
Store securely
- Store in password manager (1Password, Bitwarden)
- Upload to encrypted cloud storage
- Store offline in secure location
# Upload to S3 (example)
aws s3 cp sealed-secrets-private-key.yaml.gpg \
s3://my-backups/kimbernetes/sealed-secrets-key-$(date +%Y%m%d).yaml.gpg
Delete unencrypted copy
shred -u sealed-secrets-private-key.yaml
The sealed-secrets private key grants access to ALL encrypted secrets in your cluster. Treat it like a root password.
Restore Sealed Secrets Key
If you need to restore the private key to a new cluster:
Decrypt backup
gpg -d sealed-secrets-private-key.yaml.gpg > sealed-secrets-private-key.yaml
Install sealed-secrets controller
Wait for Flux to install sealed-secrets, or install manually:flux reconcile helmrelease sealed-secrets -n sealed-secrets
kubectl -n sealed-secrets wait --for=condition=ready pod -l app.kubernetes.io/name=sealed-secrets
Delete auto-generated key
kubectl delete secret -n sealed-secrets \
-l sealedsecrets.bitnami.com/sealed-secrets-key=active
Restore the key
kubectl apply -f sealed-secrets-private-key.yaml
Restart sealed-secrets controller
kubectl -n sealed-secrets rollout restart deployment sealed-secrets-controller
kubectl -n sealed-secrets wait --for=condition=ready pod -l app.kubernetes.io/name=sealed-secrets
Verify decryption
# Check that SealedSecrets are being decrypted
kubectl get sealedsecrets -A
kubectl get secrets -A | grep sealed
Backing Up etcd
Why Backup etcd?
etcd stores all Kubernetes cluster state. While GitOps can recreate most resources, an etcd backup enables:
- Rapid cluster recovery
- Restoration of runtime state (not in Git)
- Recovery from catastrophic failures
Backup etcd (kubeadm cluster)
SSH to control plane node
Run etcdctl snapshot
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /var/backups/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db
Verify snapshot
sudo ETCDCTL_API=3 etcdctl \
--write-out=table \
snapshot status /var/backups/etcd-snapshot-*.db
Automate etcd Backups
Create a CronJob to backup etcd regularly:
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
namespace: kube-system
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/control-plane: ""
containers:
- name: etcd-backup
image: registry.k8s.io/etcd:3.5.17-0
command:
- /bin/sh
- -c
- |
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db
volumeMounts:
- name: etcd-certs
mountPath: /etc/kubernetes/pki/etcd
readOnly: true
- name: backup
mountPath: /backup
volumes:
- name: etcd-certs
hostPath:
path: /etc/kubernetes/pki/etcd
- name: backup
hostPath:
path: /var/backups/etcd
restartPolicy: OnFailure
Combine this with a backup tool like Velero to automatically upload etcd snapshots to S3.
Restore etcd
Restoring etcd will overwrite ALL cluster state. Only use as a last resort.
Stop Kubernetes components
sudo systemctl stop kubelet
sudo mv /etc/kubernetes/manifests /etc/kubernetes/manifests.bak
Restore etcd snapshot
sudo ETCDCTL_API=3 etcdctl snapshot restore /var/backups/etcd-snapshot.db \
--data-dir=/var/lib/etcd-restore
Replace etcd data directory
sudo mv /var/lib/etcd /var/lib/etcd.old
sudo mv /var/lib/etcd-restore /var/lib/etcd
Restart Kubernetes
sudo mv /etc/kubernetes/manifests.bak /etc/kubernetes/manifests
sudo systemctl start kubelet
Verify cluster
kubectl get nodes
kubectl get pods -A
Disaster Recovery: Rebuild Cluster from Git
Complete Cluster Loss
If the entire cluster is lost, rebuild from Git:
Recreate Kubernetes cluster
Follow the cluster creation steps in overlays/kimawesome/README.md:# Configure kubeadm
sudo kubeadm init --skip-phases=addon/kube-proxy \
--apiserver-advertise-address=192.168.0.101 \
--pod-network-cidr="10.1.0.0/16" \
--upload-certs
# Install Cilium
cilium install --set kubeProxyReplacement=true \
--set k8sServiceHost=192.168.0.101 \
--set k8sServicePort=6443 \
--set nodePort.enabled=true \
--set gatewayAPI.enabled=true
# Wait for Cilium
cilium status --wait
Bootstrap Flux
export GITHUB_TOKEN="<your-token>"
flux bootstrap github \
--owner=kim-ae \
--repository=kimbernetes-k8s-flux \
--private=false \
--personal=true \
--path=cluster/kimawesome \
--components-extra='image-reflector-controller,image-automation-controller'
Restore sealed-secrets key
Before Flux creates SealedSecrets:# Wait for sealed-secrets controller
kubectl -n sealed-secrets wait --for=condition=ready pod \
-l app.kubernetes.io/name=sealed-secrets --timeout=300s
# Delete auto-generated key
kubectl delete secret -n sealed-secrets \
-l sealedsecrets.bitnami.com/sealed-secrets-key=active
# Restore backed-up key
kubectl apply -f sealed-secrets-private-key.yaml
# Restart controller
kubectl -n sealed-secrets rollout restart deployment sealed-secrets-controller
Monitor Flux reconciliation
flux get all
watch kubectl get pods -A
Flux will automatically deploy all applications from Git.Verify applications
kubectl get helmreleases -A
kubectl get certificates -A
kubectl get gateways -A
Recovery time depends on the number of HelmReleases. Expect 10-30 minutes for full reconciliation.
Backing Up Persistent Volumes
Identify PVs
kubectl get pv
kubectl get pvc -A
Backup with Velero
Velero is the standard tool for Kubernetes backups:
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.10.0 \
--bucket kimbernetes-backups \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
# Backup entire namespace
velero backup create myapp-backup --include-namespaces myapp
# Backup specific resources
velero backup create certs-backup \
--include-resources certificates,secrets \
--include-namespaces cert-manager
# Schedule automatic backups
velero schedule create daily-backup --schedule="0 2 * * *"
Manual PV Backup
For local volumes:
# Find PV host path
kubectl get pv <pv-name> -o yaml | grep path
# SSH to node and backup
ssh kim@node-01
sudo tar czf /var/backups/pv-backup-$(date +%Y%m%d).tar.gz /var/lib/kubelet/volumes
Backup Checklist
Testing Recovery
Test your backups regularly:
# Create test cluster (minikube)
minikube start --kubernetes-version=v1.33.0
# Bootstrap Flux
flux bootstrap github --path=cluster/minikube
# Restore sealed-secrets key
kubectl apply -f sealed-secrets-private-key.yaml
# Verify applications deploy
flux get helmreleases -A
Best Practices
- Test backups: Regularly test restoration procedures
- Automate: Use CronJobs or external tools for automatic backups
- Offsite storage: Store backups in different physical location/cloud
- Encrypt: Encrypt sensitive backups (sealed-secrets keys, etcd)
- Version backups: Keep multiple backup versions (daily, weekly, monthly)
- Document: Keep disaster recovery runbook up to date
- Monitor: Alert on backup failures
Next Steps