Skip to main content
Prometheus is deployed as part of the kube-prometheus-stack, providing comprehensive metrics collection, storage, and alerting for the Kimbernetes cluster. The stack includes Prometheus server, Alertmanager, kube-state-metrics, and node-exporter.

HelmRelease Configuration

overlays/base/prometheus/helmrelease.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: prometheus
spec:
  chart:
    spec:
      chart: kube-prometheus-stack
      sourceRef:
        kind: HelmRepository
        name: prometheus
      version: "=79.5.0"
  interval: 24h
  releaseName: prometheus
  targetNamespace: observability
  install:
    crds: Create
  upgrade:
    crds: CreateReplace
The HelmRepository source:
overlays/base/prometheus/helmrepository.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: prometheus
spec:
  interval: 24h
  url: https://prometheus-community.github.io/helm-charts

Prometheus Server Configuration

The Prometheus server is configured for high availability and persistence:
prometheus:
  prometheusSpec:
    tolerations:
    - effect: NoSchedule
      operator: Exists
    externalLabels:
      cluster: ${CLUSTER}
    enableRemoteWriteReceiver: false
    podAntiAffinity: hard
    replicas: 2
    retention: 2d
    retentionSize: 25GiB
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi
          storageClassName: local-storage
The serviceMonitorSelectorNilUsesHelmValues: false setting allows Prometheus to discover all ServiceMonitors and PodMonitors in the cluster, regardless of labels.

Key Features

  • High Availability: 2 replicas with hard pod anti-affinity
  • Retention: 2 days or 25GiB per replica
  • Storage: 50Gi persistent volume per replica
  • Cluster Label: External label for multi-cluster setups
  • Tolerations: Runs on all nodes including control plane

ServiceMonitor Usage

ServiceMonitors define how Prometheus scrapes metrics from Kubernetes services:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: default
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  namespace: default
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web
    interval: 30s

PodMonitor Usage

PodMonitors scrape metrics directly from pods, useful for DaemonSets or pods without services:
overlays/kimawesome/infrastructure/observability/monitors-infrastructure/metallb-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: metallb-monitor
  namespace: observability
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: metallb
  namespaceSelector:
    any: true
  podMetricsEndpoints:
  - port: "monitoring"
    interval: 30s
    path: /metrics
overlays/kimawesome/infrastructure/observability/monitors-infrastructure/kgateway-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: kgateway-monitor
  namespace: observability
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: gateway
  namespaceSelector:
    matchNames:
    - gateway-system
  podMetricsEndpoints:
  - port: metrics
    interval: 30s

Flux Metrics Integration

The stack includes custom resource state metrics for Flux resources:
- groupVersionKind:
    group: kustomize.toolkit.fluxcd.io
    version: v1
    kind: Kustomization
  metricNamePrefix: gotk
  metrics:
    - name: "resource_info"
      help: "The current state of a Flux Kustomization resource."
      each:
        type: Info
        info:
          labelsFromPath:
            name: [ metadata, name ]
      labelsFromPath:
        exported_namespace: [ metadata, namespace ]
        ready: [ status, conditions, "[type=Ready]", status ]
        suspended: [ spec, suspend ]
        revision: [ status, lastAppliedRevision ]
        source_name: [ spec, sourceRef, name ]

Example PromQL Queries

# CPU usage by pod
rate(container_cpu_usage_seconds_total{
  namespace="default",
  pod!=""
}[5m])

# Memory usage by namespace
sum by (namespace) (
  container_memory_working_set_bytes{}
)

# Disk I/O rate
rate(container_fs_reads_bytes_total[5m])

Alertmanager Configuration

Alertmanager is configured for high availability:
overlays/base/prometheus/helmrelease.yaml
alertmanager:
  podDisruptionBudget:
    enabled: false
    maxUnavailable: 1
    minAvailable: ""
  alertmanagerSpec:
    logFormat: json
    replicas: 1
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: managed-csi-zrs
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi
Alertmanager is configured with 1 replica. Multi-replica setups require additional configuration for proper alert deduplication.

Creating Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-alerts
  namespace: observability
spec:
  groups:
  - name: example
    interval: 30s
    rules:
    - alert: HighPodMemory
      expr: |
        sum by (namespace, pod) (
          container_memory_working_set_bytes{}
        ) > 1e9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} memory usage high"
        description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is using {{ $value | humanize }}B of memory."

Node Exporter

Node exporter is enabled to collect host-level metrics:
nodeExporter:
  enabled: true
Node exporter provides:
  • CPU, memory, and disk metrics
  • Network interface statistics
  • Filesystem usage
  • Hardware sensor data

Accessing Prometheus UI

1

Port Forward

kubectl port-forward -n observability svc/prometheus-kube-prometheus-prometheus 9090:9090
2

Open Browser

Navigate to http://localhost:9090
3

Query Metrics

Use the query interface to explore metrics and test PromQL expressions

Troubleshooting

Check Prometheus configuration:
# View active ServiceMonitors
kubectl port-forward -n observability svc/prometheus-kube-prometheus-prometheus 9090:9090
# Navigate to Status → Service Discovery
Verify ServiceMonitor exists:
kubectl get servicemonitor -A
kubectl describe servicemonitor my-monitor -n default
Check if the service endpoint is reachable:
# Get service endpoints
kubectl get endpoints my-app -n default

# Test metrics endpoint
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl http://my-app.default.svc:8080/metrics
Check PVC status:
kubectl get pvc -n observability | grep prometheus
kubectl describe pvc prometheus-prometheus-kube-prometheus-prometheus-0
View disk usage:
kubectl exec -n observability prometheus-prometheus-kube-prometheus-prometheus-0 -- \
  df -h /prometheus

Next Steps

Visualize in Grafana

Create dashboards for Prometheus metrics

Configure Alloy

Send metrics to remote endpoints

Query Logs

Correlate metrics with logs

Overview

Return to observability architecture

Build docs developers (and LLMs) love