Kubernetes Deployment - Apache Druid

Apache Druid can be deployed on Kubernetes using Docker containers and the druid-operator for simplified cluster management.

Docker Images

Official Druid Docker images are available on Docker Hub and can be pulled directly.

Pull Druid Image

docker pull apache/druid

Find all available versions on Docker Hub: apache/druid.

Druid Operator

The druid-operator provides Kubernetes-native management of Druid clusters.

Features

Declarative Management

Define cluster state with Kubernetes CRDs

Automatic Scaling

Scale Druid components independently

Rolling Updates

Zero-downtime upgrades and configuration changes

High Availability

Built-in HA for all Druid components

Installation

Install the Operator

kubectl create namespace druid-operator
kubectl apply -f https://raw.githubusercontent.com/datainfrahq/druid-operator/master/config/manager/manager.yaml

Verify Installation

kubectl get pods -n druid-operator

Create Druid Cluster

Apply a Druid cluster manifest:

kubectl apply -f druid-cluster.yaml

Example Cluster Configuration

apiVersion: druid.apache.org/v1alpha1
kind: Druid
metadata:
  name: druid-cluster
  namespace: druid
spec:
  image: apache/druid:28.0.0
  startScript: /druid.sh
  
  # Metadata storage
  metadataStore:
    type: postgresql
    host: postgres.druid.svc.cluster.local
    port: 5432
    database: druid
    
  # Deep storage
  deepStorage:
    type: s3
    bucket: my-druid-bucket
    baseKey: druid/segments
  
  # ZooKeeper
  zookeeper:
    zkHosts: zk-cs.druid.svc.cluster.local:2181
    
  # Common runtime properties
  commonRuntimeProperties: |
    druid.extensions.loadList=["druid-kafka-indexing-service", "druid-s3-extensions"]
    druid.startup.logging.logProperties=true
    
  # Node specifications
  nodes:
    coordinators:
      nodeType: coordinator
      druid.port: 8081
      replicas: 2
      resources:
        requests:
          memory: "4Gi"
          cpu: "2"
        limits:
          memory: "4Gi"
          cpu: "2"
      runtime.properties: |
        druid.coordinator.startDelay=PT30S
        druid.coordinator.period=PT30S
        
    brokers:
      nodeType: broker
      druid.port: 8082
      replicas: 3
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "8Gi"
          cpu: "4"
      runtime.properties: |
        druid.broker.http.numConnections=10
        druid.server.http.numThreads=40
        
    historicals:
      nodeType: historical
      druid.port: 8083
      replicas: 3
      resources:
        requests:
          memory: "16Gi"
          cpu: "8"
        limits:
          memory: "16Gi"
          cpu: "8"
      runtime.properties: |
        druid.processing.numThreads=7
        druid.processing.buffer.sizeBytes=536870912
        druid.segmentCache.locations=[{"path":"/druid/data/segments","maxSize":"100g"}]
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: historical-data
            
    middleManagers:
      nodeType: middleManager
      druid.port: 8091
      replicas: 2
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "8Gi"
          cpu: "4"
      runtime.properties: |
        druid.worker.capacity=4
        druid.indexer.runner.javaOpts=-Xms2g -Xmx2g
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: middlemanager-data
            
    routers:
      nodeType: router
      druid.port: 8888
      replicas: 2
      resources:
        requests:
          memory: "1Gi"
          cpu: "1"
        limits:
          memory: "1Gi"
          cpu: "1"
      runtime.properties: |
        druid.router.http.numConnections=50
        druid.router.http.numMaxThreads=100

ZooKeeper-less Deployment

Druid can run on Kubernetes without ZooKeeper by using the druid-kubernetes-extensions.

Enable Kubernetes Extensions

Load Extension

druid.extensions.loadList=["druid-kubernetes-extensions", ...]

Configure Discovery

# Use Kubernetes for service discovery
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http

# Kubernetes-specific settings
druid.discovery.type=k8s
druid.discovery.k8s.clusterIdentifier=druid-cluster

Leader Election

# Use Kubernetes for leader election (no ZooKeeper needed)
druid.leader.election.type=k8s
druid.leader.election.k8s.namespace=druid
druid.leader.election.k8s.lockResourceName=druid-leader-election

ZooKeeper-less mode requires Kubernetes 1.19+ and uses Kubernetes ConfigMaps for coordination.

Service Exposure

Internal Services

Create Kubernetes Services for inter-pod communication:

services.yaml

apiVersion: v1
kind: Service
metadata:
  name: druid-broker
  namespace: druid
spec:
  type: ClusterIP
  ports:
    - port: 8082
      targetPort: 8082
      name: broker
  selector:
    nodeType: broker
---
apiVersion: v1
kind: Service
metadata:
  name: druid-router
  namespace: druid
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8888
      name: router
  selector:
    nodeType: router

External Access

LoadBalancer
Ingress
NodePort

spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8888

Automatically provisions a cloud load balancer (AWS ELB, GCP LB, etc.)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: druid-ingress
  namespace: druid
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - druid.example.com
      secretName: druid-tls
  rules:
    - host: druid.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: druid-router
                port:
                  number: 8888

spec:
  type: NodePort
  ports:
    - port: 8888
      nodePort: 30888
      targetPort: 8888

Access via http://<node-ip>:30888

Resource Management

Resource Requests and Limits

Set appropriate requests and limits to ensure Kubernetes schedules pods efficiently and prevents resource contention.

resources:
  requests:
    memory: "8Gi"    # Guaranteed memory
    cpu: "4"         # Guaranteed CPU
  limits:
    memory: "8Gi"    # Maximum memory (should match requests for production)
    cpu: "4"         # Maximum CPU

For production, set memory limits equal to requests to avoid OOM kills and ensure consistent performance.

Quality of Service Classes

Guaranteed (Recommended)
Burstable

resources:
  requests:
    memory: "8Gi"
    cpu: "4"
  limits:
    memory: "8Gi"  # Same as request
    cpu: "4"       # Same as request

Best for production workloads. Pods won’t be evicted unless they exceed limits.

resources:
  requests:
    memory: "4Gi"
    cpu: "2"
  limits:
    memory: "8Gi"  # Higher than request
    cpu: "4"       # Higher than request

Good for variable workloads. Can burst above requests.

Monitoring and Observability

Prometheus Integration

Enable Prometheus Emitter

druid.extensions.loadList=["prometheus-emitter", ...]
druid.emitter=prometheus
druid.emitter.prometheus.strategy=exporter
druid.emitter.prometheus.port=8000

Create ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: druid-metrics
  namespace: druid
spec:
  selector:
    matchLabels:
      app: druid
  endpoints:
    - port: metrics
      interval: 30s

Health Checks

livenessProbe:
  httpGet:
    path: /status/health
    port: 8082
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  
readinessProbe:
  httpGet:
    path: /status/health
    port: 8082
  initialDelaySeconds: 30
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

Best Practices

Use Init Containers for Dependencies

Ensure dependencies (ZooKeeper, metadata store) are ready:

initContainers:
  - name: wait-for-postgres
    image: busybox:1.35
    command:
      - sh
      - -c
      - |
        until nc -z postgres.druid.svc.cluster.local 5432; do
          echo "Waiting for PostgreSQL..."
          sleep 2
        done

Configure Pod Disruption Budgets

Prevent too many pods from being disrupted simultaneously:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: druid-broker-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      nodeType: broker

Use Anti-Affinity for HA

Spread replicas across different nodes:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: nodeType
              operator: In
              values:
                - broker
        topologyKey: kubernetes.io/hostname

Configure Storage Classes

Use appropriate storage classes for different workloads:

Historicals: Fast SSD (e.g., gp3 on AWS, pd-ssd on GCP)
MiddleManagers: Fast SSD
Metadata: Persistent SSD with backups

Enable Auto-scaling

Use Horizontal Pod Autoscaler for dynamic scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: druid-broker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: druid-broker
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Troubleshooting

View Logs

kubectl logs -f <pod-name> -n druid

Describe Pod

kubectl describe pod <pod-name> -n druid

Shell into Pod

kubectl exec -it <pod-name> -n druid -- /bin/bash

Check Events

kubectl get events -n druid --sort-by='.lastTimestamp'

Additional Resources

Druid Operator

Official Kubernetes operator repository

Docker Hub

Official Druid Docker images

Kubernetes Extensions

ZooKeeper-less deployment guide

Helm Charts

Community Helm charts

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Docker Images

​Pull Druid Image

​Druid Operator

​Features

Declarative Management

Automatic Scaling

Rolling Updates

High Availability

​Installation

​Example Cluster Configuration

​ZooKeeper-less Deployment

​Enable Kubernetes Extensions

​Service Exposure

​Internal Services

​External Access

​Resource Management

​Resource Requests and Limits

​Quality of Service Classes

​Monitoring and Observability

​Prometheus Integration

​Health Checks

​Best Practices

​Troubleshooting

View Logs

Describe Pod

Shell into Pod

Check Events

​Additional Resources

Druid Operator

Docker Hub

Kubernetes Extensions

Helm Charts

Build docs developers (and LLMs) love

Docker Images

Pull Druid Image

Druid Operator

Features

Installation

Example Cluster Configuration

ZooKeeper-less Deployment

Enable Kubernetes Extensions

Service Exposure

Internal Services

External Access

Resource Management

Resource Requests and Limits

Quality of Service Classes

Monitoring and Observability

Prometheus Integration

Health Checks

Best Practices

Troubleshooting

Additional Resources