Skip to main content
Apache Druid can be deployed on Kubernetes using Docker containers and the druid-operator for simplified cluster management.

Docker Images

Official Druid Docker images are available on Docker Hub and can be pulled directly.

Pull Druid Image

docker pull apache/druid
Find all available versions on Docker Hub: apache/druid.

Druid Operator

The druid-operator provides Kubernetes-native management of Druid clusters.

Features

Declarative Management

Define cluster state with Kubernetes CRDs

Automatic Scaling

Scale Druid components independently

Rolling Updates

Zero-downtime upgrades and configuration changes

High Availability

Built-in HA for all Druid components

Installation

1

Install the Operator

kubectl create namespace druid-operator
kubectl apply -f https://raw.githubusercontent.com/datainfrahq/druid-operator/master/config/manager/manager.yaml
2

Verify Installation

kubectl get pods -n druid-operator
3

Create Druid Cluster

Apply a Druid cluster manifest:
kubectl apply -f druid-cluster.yaml

Example Cluster Configuration

apiVersion: druid.apache.org/v1alpha1
kind: Druid
metadata:
  name: druid-cluster
  namespace: druid
spec:
  image: apache/druid:28.0.0
  startScript: /druid.sh
  
  # Metadata storage
  metadataStore:
    type: postgresql
    host: postgres.druid.svc.cluster.local
    port: 5432
    database: druid
    
  # Deep storage
  deepStorage:
    type: s3
    bucket: my-druid-bucket
    baseKey: druid/segments
  
  # ZooKeeper
  zookeeper:
    zkHosts: zk-cs.druid.svc.cluster.local:2181
    
  # Common runtime properties
  commonRuntimeProperties: |
    druid.extensions.loadList=["druid-kafka-indexing-service", "druid-s3-extensions"]
    druid.startup.logging.logProperties=true
    
  # Node specifications
  nodes:
    coordinators:
      nodeType: coordinator
      druid.port: 8081
      replicas: 2
      resources:
        requests:
          memory: "4Gi"
          cpu: "2"
        limits:
          memory: "4Gi"
          cpu: "2"
      runtime.properties: |
        druid.coordinator.startDelay=PT30S
        druid.coordinator.period=PT30S
        
    brokers:
      nodeType: broker
      druid.port: 8082
      replicas: 3
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "8Gi"
          cpu: "4"
      runtime.properties: |
        druid.broker.http.numConnections=10
        druid.server.http.numThreads=40
        
    historicals:
      nodeType: historical
      druid.port: 8083
      replicas: 3
      resources:
        requests:
          memory: "16Gi"
          cpu: "8"
        limits:
          memory: "16Gi"
          cpu: "8"
      runtime.properties: |
        druid.processing.numThreads=7
        druid.processing.buffer.sizeBytes=536870912
        druid.segmentCache.locations=[{"path":"/druid/data/segments","maxSize":"100g"}]
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: historical-data
            
    middleManagers:
      nodeType: middleManager
      druid.port: 8091
      replicas: 2
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "8Gi"
          cpu: "4"
      runtime.properties: |
        druid.worker.capacity=4
        druid.indexer.runner.javaOpts=-Xms2g -Xmx2g
      volumeMounts:
        - mountPath: /druid/data
          name: data-volume
      volumes:
        - name: data-volume
          persistentVolumeClaim:
            claimName: middlemanager-data
            
    routers:
      nodeType: router
      druid.port: 8888
      replicas: 2
      resources:
        requests:
          memory: "1Gi"
          cpu: "1"
        limits:
          memory: "1Gi"
          cpu: "1"
      runtime.properties: |
        druid.router.http.numConnections=50
        druid.router.http.numMaxThreads=100

ZooKeeper-less Deployment

Druid can run on Kubernetes without ZooKeeper by using the druid-kubernetes-extensions.

Enable Kubernetes Extensions

1

Load Extension

druid.extensions.loadList=["druid-kubernetes-extensions", ...]
2

Configure Discovery

# Use Kubernetes for service discovery
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http

# Kubernetes-specific settings
druid.discovery.type=k8s
druid.discovery.k8s.clusterIdentifier=druid-cluster
3

Leader Election

# Use Kubernetes for leader election (no ZooKeeper needed)
druid.leader.election.type=k8s
druid.leader.election.k8s.namespace=druid
druid.leader.election.k8s.lockResourceName=druid-leader-election
ZooKeeper-less mode requires Kubernetes 1.19+ and uses Kubernetes ConfigMaps for coordination.

Service Exposure

Internal Services

Create Kubernetes Services for inter-pod communication:
services.yaml
apiVersion: v1
kind: Service
metadata:
  name: druid-broker
  namespace: druid
spec:
  type: ClusterIP
  ports:
    - port: 8082
      targetPort: 8082
      name: broker
  selector:
    nodeType: broker
---
apiVersion: v1
kind: Service
metadata:
  name: druid-router
  namespace: druid
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8888
      name: router
  selector:
    nodeType: router

External Access

spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8888
Automatically provisions a cloud load balancer (AWS ELB, GCP LB, etc.)

Resource Management

Resource Requests and Limits

Set appropriate requests and limits to ensure Kubernetes schedules pods efficiently and prevents resource contention.
resources:
  requests:
    memory: "8Gi"    # Guaranteed memory
    cpu: "4"         # Guaranteed CPU
  limits:
    memory: "8Gi"    # Maximum memory (should match requests for production)
    cpu: "4"         # Maximum CPU
For production, set memory limits equal to requests to avoid OOM kills and ensure consistent performance.

Quality of Service Classes

Monitoring and Observability

Prometheus Integration

1

Enable Prometheus Emitter

druid.extensions.loadList=["prometheus-emitter", ...]
druid.emitter=prometheus
druid.emitter.prometheus.strategy=exporter
druid.emitter.prometheus.port=8000
2

Create ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: druid-metrics
  namespace: druid
spec:
  selector:
    matchLabels:
      app: druid
  endpoints:
    - port: metrics
      interval: 30s

Health Checks

livenessProbe:
  httpGet:
    path: /status/health
    port: 8082
  initialDelaySeconds: 60
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3
  
readinessProbe:
  httpGet:
    path: /status/health
    port: 8082
  initialDelaySeconds: 30
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

Best Practices

Ensure dependencies (ZooKeeper, metadata store) are ready:
initContainers:
  - name: wait-for-postgres
    image: busybox:1.35
    command:
      - sh
      - -c
      - |
        until nc -z postgres.druid.svc.cluster.local 5432; do
          echo "Waiting for PostgreSQL..."
          sleep 2
        done
Prevent too many pods from being disrupted simultaneously:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: druid-broker-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      nodeType: broker
Spread replicas across different nodes:
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: nodeType
              operator: In
              values:
                - broker
        topologyKey: kubernetes.io/hostname
Use appropriate storage classes for different workloads:
  • Historicals: Fast SSD (e.g., gp3 on AWS, pd-ssd on GCP)
  • MiddleManagers: Fast SSD
  • Metadata: Persistent SSD with backups
Use Horizontal Pod Autoscaler for dynamic scaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: druid-broker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: druid-broker
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Troubleshooting

View Logs

kubectl logs -f <pod-name> -n druid

Describe Pod

kubectl describe pod <pod-name> -n druid

Shell into Pod

kubectl exec -it <pod-name> -n druid -- /bin/bash

Check Events

kubectl get events -n druid --sort-by='.lastTimestamp'

Additional Resources

Druid Operator

Official Kubernetes operator repository

Docker Hub

Official Druid Docker images

Kubernetes Extensions

ZooKeeper-less deployment guide

Helm Charts

Community Helm charts

Build docs developers (and LLMs) love