Performance Tuning - Apache Druid

This guide provides configuration recommendations and best practices for tuning Apache Druid cluster performance. These are general guidelines and starting points that should be adjusted based on your specific workload.

For comprehensive configuration details, see the Configuration Reference. For specific tuning questions, consult the Druid community.

Historical Tuning

Historicals are the workhorses of query processing, serving data from segments stored on disk.

Heap Sizing

The primary contributors to Historical heap usage:

Query Results

Partial unmerged query results from segments

Lookups

Stored maps for lookup tables

General guideline: (0.5 GiB × number of CPU cores)

Basic Configuration
With Lookups
With Cache

# For an 8-core server
-Xms4g
-Xmx4g

# 8-core server with 2GB of lookups
# (0.5 × 8) + (2 × 2) = 8GB
-Xms8g
-Xmx8g

Multiply lookup size by 2 to account for atomic swaps during updates.

# Add cache size to heap calculation
druid.cache.sizeInBytes=2147483648  # 2GB

# Total heap: (0.5 × cores) + cache + lookups
# 8 cores, 2GB cache, 1GB lookups
# (0.5 × 8) + 2 + 2 = 8GB

For heaps larger than 24GB, use a modern GC like Shenandoah or ZGC to avoid long GC pauses.

Processing Threads and Buffers

druid.processing.numThreads

integer

default:"number of cores - 1"

Number of processing threads for query execution.

Too low: CPU underutilization
Too high: CPU contention
Recommended: number of cores - 1

druid.processing.buffer.sizeBytes

bytes

default:"500MiB"

Size of off-heap buffers for processing. Larger buffers allow more data to be processed in a single pass.Recommended: 500MiB - 1GiB

druid.processing.numMergeBuffers

integer

Buffers for merging GroupBy query results.Recommended: numThreads ÷ 4 (1:4 ratio)

# Example for 16-core Historical
druid.processing.numThreads=15
druid.processing.buffer.sizeBytes=536870912  # 512MiB
druid.processing.numMergeBuffers=4

Direct Memory Sizing

Calculate required direct memory:

(numThreads + numMergeBuffers + 1) × buffer.sizeBytes

Example Calculation

# Configuration
druid.processing.numThreads=15
druid.processing.numMergeBuffers=4
druid.processing.buffer.sizeBytes=536870912  # 512MiB

# Calculation
(15 + 4 + 1) × 512MiB = 10,240 MiB ≈ 10GB

# Set JVM flag
-XX:MaxDirectMemorySize=10g

The +1 accounts for segment decompression buffers.

Connection Pool

druid.server.http.numThreads

integer

HTTP server threads for handling requests.Formula: Slightly higher than sum of druid.broker.http.numConnections across all Brokers

Recommended starting point: 60 threads (50 queries + 10 non-query requests)

druid.server.http.numThreads=60

Segment Cache

Critical: Do not allocate more segment data than available system memory. Free memory is used for OS page cache.

# Configure segment cache locations
druid.segmentCache.locations=[{"path":"/mnt/druid/segment-cache","maxSize":"500g"}]

# Optional: Override total size calculation
druid.server.maxSize=500g

For optimal performance, ensure: free_system_memory / total_segment_cache_size ratio is healthy. Aim for at least 20-30% of segments to fit in page cache.

Storage Recommendations

Use SSDs for Historical storage to minimize segment load times and improve query performance.

Broker Tuning

Brokers merge results from Historicals and route queries.

Heap Sizing

Broker heap usage scales with:

Partial Query Results

Results from Historicals and Tasks before merging

Segment Timeline

Location information for all available segments

Segment Metadata

Schemas and metadata for all segments

Recommendations:

Small/Medium clusters (~15 servers): 4-8 GiB
Large clusters (~100 servers): 30-60 GiB

# Medium cluster
-Xms6g
-Xmx6g

# Large cluster  
-Xms40g
-Xmx40g

Processing Configuration

Brokers typically don’t need processing threads (merge happens on-heap), but do need merge buffers:

druid.processing.buffer.sizeBytes=536870912  # 512MiB
druid.processing.numMergeBuffers=4  # Same as Historical or higher

Direct memory: (numMergeBuffers + 1) × buffer.sizeBytes

Connection Pool

druid.broker.http.numConnections

integer

Outgoing connections per Historical/Task. This limits concurrent queries the Broker can process.

druid.server.http.numThreads

integer

Incoming HTTP request threads. Should be slightly higher than numConnections.

# Example for 3 Brokers, 10 connections each = 30 total to each Historical
druid.broker.http.numConnections=10
druid.server.http.numThreads=12

Backpressure

druid.broker.http.maxQueuedBytes

bytes

Maximum buffer size for queued unread data from Historicals/Tasks.Formula: ≈ 2MiB × number of Historicals

# For 20 Historicals
druid.broker.http.maxQueuedBytes=41943040  # 40MiB

Too small: Frequent stalls, inefficient queries
Too large: High memory pressure on Broker

Broker Count

General guideline: 1 Broker per 15 Historicals (minimum 2 for HA)

MiddleManager Tuning

MiddleManagers launch Task processes for ingestion.

MiddleManager Configuration

# MiddleManager itself is lightweight
-Xms128m
-Xmx128m

druid.worker.capacity

integer

Maximum number of concurrent Tasks this MiddleManager can run.

druid.worker.capacity=4

Task Configuration

Heap
Processing
Connection Pool

# Basic heap for tasks
druid.indexer.runner.javaOpts=-Xms1g -Xmx1g

# With 2GB lookups
druid.indexer.runner.javaOpts=-Xms5g -Xmx5g

# Tasks typically need fewer threads than Historicals
druid.indexer.fork.property.druid.processing.numThreads=2
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=104857600  # 100MiB

# Similar to Historical configuration
druid.indexer.fork.property.druid.server.http.numThreads=60

Direct memory formula: (numThreads + numMergeBuffers + 1) × buffer.sizeBytes

Total MiddleManager Memory

Total = MM heap + (capacity × single task memory)

Example:
128MB + (4 × 2GB) = 8.128 GB

Use SSDs for MiddleManager storage for faster segment building and merging.

Coordinator Tuning

Heap Sizing

Coordinator heap scales with:

Number of servers
Number of segments
Number of tasks

Recommendation: Same size as Broker heap, or slightly smaller

-Xms6g
-Xmx6g

Dynamic Configuration

percentOfSegmentsToConsiderPerMove

integer

default:"100"

Percentage of segments to consider when looking for segments to move.

100: Consider all segments (default, slower)
66: Consider top 66% (faster, recommended starting point)
25: Consider top 25% (much faster, more aggressive)

{
  "percentOfSegmentsToConsiderPerMove": 66,
  "maxSegmentsToMove": 5
}

Lower this value if Coordinator cycles take too long on clusters with hundreds of thousands of segments.

Overlord Tuning

Overlord heap scales primarily with the number of running tasks. Recommendation: 25-50% of Coordinator heap

# For Coordinator with 6GB
-Xms2g
-Xmx2g

Router Tuning

Router has minimal resource requirements.

-Xms256m
-Xmx256m

JVM Tuning

Garbage Collection

-XX:+UseG1GC
-XX:+ExitOnOutOfMemoryError
-XX:MaxGCPauseMillis=200

Shenandoah and ZGC are recommended for heaps > 24GB to minimize pause times.

Essential JVM Flags

# Timezone and encoding
-Duser.timezone=UTC
-Dfile.encoding=UTF-8

# Temp directory (use fast, non-volatile storage, NOT tmpfs or NFS)
-Djava.io.tmpdir=/mnt/fast-disk/druid-tmp

# Logging
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dorg.jboss.logging.provider=slf4j
-Dlog4j.shutdownHookEnabled=true

# GC logging
-Xloggc:/var/log/druid/historical.gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=50
-XX:GCLogFileSize=10m

# Error handling
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/druid/historical.hprof

# Direct memory (calculate based on processing buffers)
-XX:MaxDirectMemorySize=10g

System Configuration

Linux Limits

File Descriptors
Memory Maps

Edit /etc/security/limits.conf:

druid soft nofile 65536
druid hard nofile 65536

Edit /etc/sysctl.d/99-druid.conf:

vm.max_map_count=262144

Apply changes:

sudo sysctl -p /etc/sysctl.d/99-druid.conf

Swap Space

Disable swap for Historical, MiddleManager, and Indexer processes. Memory-mapped segments with swap lead to poor performance.

# Disable swap for the druid user's processes
sudo swapoff -a  # Temporary

# Permanent: comment out swap in /etc/fstab

Storage

Use SSDs

Essential for Historicals and MiddleManagers

JBOD vs RAID

JBOD can provide better throughput than software RAID

Additional Optimizations

Mount /tmp on tmpfs

Avoid GC pauses from JVM statistics. See The Four Month Bug.

Separate GC logs from data

Write GC and Druid logs to different disks than data on disk-intensive processes.

Disable Transparent Huge Pages

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Consider disabling biased locking

-XX:-UseBiasedLocking

Monitoring and Validation

Key metrics to monitor after tuning:

Query latency (query/time)
GC pause time (jvm/gc/time)
Thread pool utilization (jetty/threadPool/utilizationRate)
Segment scan queue (segment/scan/pending)
Ingestion lag (Kafka/Kinesis specific)

Always test configuration changes in a staging environment before applying to production.

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Historical Tuning

​Heap Sizing

Query Results

Lookups

​Processing Threads and Buffers

​Direct Memory Sizing

​Connection Pool

​Segment Cache

​Storage Recommendations

​Broker Tuning

​Heap Sizing

​Processing Configuration

​Connection Pool

​Backpressure

​Broker Count

​MiddleManager Tuning

​MiddleManager Configuration

​Task Configuration

​Total MiddleManager Memory

​Coordinator Tuning

​Heap Sizing

​Dynamic Configuration

​Overlord Tuning

​Router Tuning

​JVM Tuning

​Garbage Collection

​Essential JVM Flags

​System Configuration

​Linux Limits

​Swap Space

​Storage

Use SSDs

JBOD vs RAID

​Additional Optimizations

​Monitoring and Validation

Build docs developers (and LLMs) love

Historical Tuning

Heap Sizing

Processing Threads and Buffers

Direct Memory Sizing

Connection Pool

Segment Cache

Storage Recommendations

Broker Tuning

Heap Sizing

Processing Configuration

Connection Pool

Backpressure

Broker Count

MiddleManager Tuning

MiddleManager Configuration

Task Configuration

Total MiddleManager Memory

Coordinator Tuning

Heap Sizing

Dynamic Configuration

Overlord Tuning

Router Tuning

JVM Tuning

Garbage Collection

Essential JVM Flags

System Configuration

Linux Limits

Swap Space

Storage

Additional Optimizations

Monitoring and Validation