Skip to main content
This guide provides configuration recommendations and best practices for tuning Apache Druid cluster performance. These are general guidelines and starting points that should be adjusted based on your specific workload.
For comprehensive configuration details, see the Configuration Reference. For specific tuning questions, consult the Druid community.

Historical Tuning

Historicals are the workhorses of query processing, serving data from segments stored on disk.

Heap Sizing

The primary contributors to Historical heap usage:

Query Results

Partial unmerged query results from segments

Lookups

Stored maps for lookup tables
General guideline: (0.5 GiB × number of CPU cores)
# For an 8-core server
-Xms4g
-Xmx4g
For heaps larger than 24GB, use a modern GC like Shenandoah or ZGC to avoid long GC pauses.

Processing Threads and Buffers

druid.processing.numThreads
integer
default:"number of cores - 1"
Number of processing threads for query execution.
  • Too low: CPU underutilization
  • Too high: CPU contention
  • Recommended: number of cores - 1
druid.processing.buffer.sizeBytes
bytes
default:"500MiB"
Size of off-heap buffers for processing. Larger buffers allow more data to be processed in a single pass.Recommended: 500MiB - 1GiB
druid.processing.numMergeBuffers
integer
Buffers for merging GroupBy query results.Recommended: numThreads ÷ 4 (1:4 ratio)
# Example for 16-core Historical
druid.processing.numThreads=15
druid.processing.buffer.sizeBytes=536870912  # 512MiB
druid.processing.numMergeBuffers=4

Direct Memory Sizing

Calculate required direct memory:
(numThreads + numMergeBuffers + 1) × buffer.sizeBytes
# Configuration
druid.processing.numThreads=15
druid.processing.numMergeBuffers=4
druid.processing.buffer.sizeBytes=536870912  # 512MiB

# Calculation
(15 + 4 + 1) × 512MiB = 10,240 MiB 10GB

# Set JVM flag
-XX:MaxDirectMemorySize=10g
The +1 accounts for segment decompression buffers.

Connection Pool

druid.server.http.numThreads
integer
HTTP server threads for handling requests.Formula: Slightly higher than sum of druid.broker.http.numConnections across all Brokers
Recommended starting point: 60 threads (50 queries + 10 non-query requests)
druid.server.http.numThreads=60

Segment Cache

Critical: Do not allocate more segment data than available system memory. Free memory is used for OS page cache.
# Configure segment cache locations
druid.segmentCache.locations=[{"path":"/mnt/druid/segment-cache","maxSize":"500g"}]

# Optional: Override total size calculation
druid.server.maxSize=500g
For optimal performance, ensure: free_system_memory / total_segment_cache_size ratio is healthy. Aim for at least 20-30% of segments to fit in page cache.

Storage Recommendations

Use SSDs for Historical storage to minimize segment load times and improve query performance.

Broker Tuning

Brokers merge results from Historicals and route queries.

Heap Sizing

Broker heap usage scales with:
1

Partial Query Results

Results from Historicals and Tasks before merging
2

Segment Timeline

Location information for all available segments
3

Segment Metadata

Schemas and metadata for all segments
Recommendations:
  • Small/Medium clusters (~15 servers): 4-8 GiB
  • Large clusters (~100 servers): 30-60 GiB
# Medium cluster
-Xms6g
-Xmx6g

# Large cluster  
-Xms40g
-Xmx40g

Processing Configuration

Brokers typically don’t need processing threads (merge happens on-heap), but do need merge buffers:
druid.processing.buffer.sizeBytes=536870912  # 512MiB
druid.processing.numMergeBuffers=4  # Same as Historical or higher
Direct memory: (numMergeBuffers + 1) × buffer.sizeBytes

Connection Pool

druid.broker.http.numConnections
integer
Outgoing connections per Historical/Task. This limits concurrent queries the Broker can process.
druid.server.http.numThreads
integer
Incoming HTTP request threads. Should be slightly higher than numConnections.
# Example for 3 Brokers, 10 connections each = 30 total to each Historical
druid.broker.http.numConnections=10
druid.server.http.numThreads=12

Backpressure

druid.broker.http.maxQueuedBytes
bytes
Maximum buffer size for queued unread data from Historicals/Tasks.Formula: ≈ 2MiB × number of Historicals
# For 20 Historicals
druid.broker.http.maxQueuedBytes=41943040  # 40MiB
  • Too small: Frequent stalls, inefficient queries
  • Too large: High memory pressure on Broker

Broker Count

General guideline: 1 Broker per 15 Historicals (minimum 2 for HA)

MiddleManager Tuning

MiddleManagers launch Task processes for ingestion.

MiddleManager Configuration

# MiddleManager itself is lightweight
-Xms128m
-Xmx128m
druid.worker.capacity
integer
Maximum number of concurrent Tasks this MiddleManager can run.
druid.worker.capacity=4

Task Configuration

# Basic heap for tasks
druid.indexer.runner.javaOpts=-Xms1g -Xmx1g

# With 2GB lookups
druid.indexer.runner.javaOpts=-Xms5g -Xmx5g
Direct memory formula: (numThreads + numMergeBuffers + 1) × buffer.sizeBytes

Total MiddleManager Memory

Total = MM heap + (capacity × single task memory)

Example:
128MB + (4 × 2GB) = 8.128 GB
Use SSDs for MiddleManager storage for faster segment building and merging.

Coordinator Tuning

Heap Sizing

Coordinator heap scales with:
  • Number of servers
  • Number of segments
  • Number of tasks
Recommendation: Same size as Broker heap, or slightly smaller
-Xms6g
-Xmx6g

Dynamic Configuration

percentOfSegmentsToConsiderPerMove
integer
default:"100"
Percentage of segments to consider when looking for segments to move.
  • 100: Consider all segments (default, slower)
  • 66: Consider top 66% (faster, recommended starting point)
  • 25: Consider top 25% (much faster, more aggressive)
{
  "percentOfSegmentsToConsiderPerMove": 66,
  "maxSegmentsToMove": 5
}
Lower this value if Coordinator cycles take too long on clusters with hundreds of thousands of segments.

Overlord Tuning

Overlord heap scales primarily with the number of running tasks. Recommendation: 25-50% of Coordinator heap
# For Coordinator with 6GB
-Xms2g
-Xmx2g

Router Tuning

Router has minimal resource requirements.
-Xms256m
-Xmx256m

JVM Tuning

Garbage Collection

-XX:+UseG1GC
-XX:+ExitOnOutOfMemoryError
-XX:MaxGCPauseMillis=200
Shenandoah and ZGC are recommended for heaps > 24GB to minimize pause times.

Essential JVM Flags

# Timezone and encoding
-Duser.timezone=UTC
-Dfile.encoding=UTF-8

# Temp directory (use fast, non-volatile storage, NOT tmpfs or NFS)
-Djava.io.tmpdir=/mnt/fast-disk/druid-tmp

# Logging
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dorg.jboss.logging.provider=slf4j
-Dlog4j.shutdownHookEnabled=true

# GC logging
-Xloggc:/var/log/druid/historical.gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=50
-XX:GCLogFileSize=10m

# Error handling
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/druid/historical.hprof

# Direct memory (calculate based on processing buffers)
-XX:MaxDirectMemorySize=10g

System Configuration

Linux Limits

Edit /etc/security/limits.conf:
druid soft nofile 65536
druid hard nofile 65536

Swap Space

Disable swap for Historical, MiddleManager, and Indexer processes. Memory-mapped segments with swap lead to poor performance.
# Disable swap for the druid user's processes
sudo swapoff -a  # Temporary

# Permanent: comment out swap in /etc/fstab

Storage

Use SSDs

Essential for Historicals and MiddleManagers

JBOD vs RAID

JBOD can provide better throughput than software RAID

Additional Optimizations

1

Mount /tmp on tmpfs

Avoid GC pauses from JVM statistics. See The Four Month Bug.
2

Separate GC logs from data

Write GC and Druid logs to different disks than data on disk-intensive processes.
3

Disable Transparent Huge Pages

echo never > /sys/kernel/mm/transparent_hugepage/enabled
4

Consider disabling biased locking

-XX:-UseBiasedLocking

Monitoring and Validation

Key metrics to monitor after tuning:
  • Query latency (query/time)
  • GC pause time (jvm/gc/time)
  • Thread pool utilization (jetty/threadPool/utilizationRate)
  • Segment scan queue (segment/scan/pending)
  • Ingestion lag (Kafka/Kinesis specific)
Always test configuration changes in a staging environment before applying to production.

Build docs developers (and LLMs) love