Overview
This guide covers common issues you may encounter when operating the Kimbernetes cluster and how to resolve them.Diagnostic Commands
Check Flux Status
View Logs
Inspect Resources
Common Issues
Kustomization Reconciliation Failed
Kustomization Reconciliation Failed
Symptoms:Common Causes:
- Kustomization shows
Falsestatus - Changes in Git are not applied to cluster
- Error: “kustomize build failed”
-
Invalid YAML syntax
Fix syntax errors in your YAML files and commit.
-
Missing resource files
Ensure all referenced files exist.
-
Namespace doesn’t exist
Create the namespace first or add it to your kustomization:
- Fix the issue in your Git repository
- Commit and push changes
- Force reconciliation:
HelmRelease Failed to Install
HelmRelease Failed to Install
Symptoms:Common Causes:
- HelmRelease status shows
False - Error: “installation failed” or “upgrade failed”
- Application not running
-
Chart version not found
Check available versions:Update to a valid version in
helm-release.yaml. -
HelmRepository not ready
Reconcile the repository:
-
Invalid Helm values
Fix invalid values in the HelmRelease spec.
-
Resource conflicts
Check for duplicate resources:Delete conflicting resources or adjust the HelmRelease.
- Fix the issue in the HelmRelease definition
- Commit and push
- Force reconciliation:
Flux Controllers Not Running
Flux Controllers Not Running
Symptoms:Common Causes:
- No reconciliation happening
- Flux pods in CrashLoopBackOff
- Error: “connection refused” to Flux API
-
Resource limits
Pods being OOMKilled:
Increase memory limits in
cluster/kimawesome/flux-system/gotk-components.yaml. -
Network policy blocking
Check network policies:
Ensure
allow-egresspolicy exists (already configured in Flux v2.7.5). -
Image pull issues
Check image registry connectivity.
Git Repository Authentication Failed
Git Repository Authentication Failed
Symptoms:Resolution:
- Error: “authentication required” or “permission denied”
- GitRepository shows
Falsestatus - No reconciliation from Git
-
Check secret exists:
-
Verify SSH key:
-
Test Git access manually:
-
Regenerate deploy key:
Add the public key to GitHub repository deploy keys.
Certificate Issues
Certificate Issues
Symptoms:Common Causes:
- Ingress shows certificate errors
- Error: “certificate not ready”
- cert-manager pods failing
-
DNS not propagated
Wait for DNS to propagate:
-
HTTP01 challenge failed
Check ingress is accessible:
-
Rate limit hit
Let’s Encrypt rate limits reached. Wait or use staging:
- Delete failed certificate:
- Delete certificate request:
- Let cert-manager retry automatically
Network Connectivity Issues
Network Connectivity Issues
Symptoms:Resolution:
- Pods cannot reach external services
- DNS resolution failing
- Inter-pod communication broken
-
Restart Cilium:
-
Check IP forwarding:
-
Verify CoreDNS:
Sealed Secrets Decryption Failed
Sealed Secrets Decryption Failed
Symptoms:Common Causes:
- SealedSecret exists but Secret not created
- Error: “no key could decrypt secret”
-
Sealed secret encrypted with wrong key
Re-encrypt with current cluster key:
-
Sealed secrets controller not ready
Resource Debugging
Pod Failing to Start
Service Not Accessible
Ingress Not Working
Performance Issues
High Memory Usage
Slow Reconciliation
Emergency Procedures
Rollback a Change
Bypass Flux Temporarily
Getting Help
- Check Flux documentation
- View Flux GitHub issues
- Enable debug logging:
Next Steps
- Learn about Managing Resources
- Understand Upgrade procedures
- Set up Backup and restore