Key Differences from Middle Manager + Peon
Execution Model
Indexer: Tasks run as threads in a single JVMMiddle Manager: Tasks run in separate JVM processes (Peons)
Resource Sharing
Indexer: Better resource sharing across tasks (query processing, HTTP threads, memory)Middle Manager: Complete isolation between tasks
Configuration
Indexer: Easier to configure and deployMiddle Manager: More configuration parameters for Peon processes
Overhead
Indexer: Lower per-task overhead (no JVM startup, shared resources)Middle Manager: Higher overhead but better isolation
When to Use the Indexer
The Indexer is designed to be easier to configure and deploy compared to the Middle Manager + Peon system and to better enable resource sharing across tasks.- Good Use Cases
- Consider Alternatives
Streaming ingestion workloads with many concurrent tasks
Resource-constrained environments where JVM overhead matters
Simplified deployments where ease of configuration is important
Workloads with predictable resource usage across tasks
Configuration
For Apache Druid Indexer service configuration, see:Running the Indexer
HTTP Endpoints
The Indexer service shares the same HTTP endpoints as the Middle Manager:Task Resource Sharing
The following resources are shared across all tasks running inside the Indexer service:Query Resources
The query processing threads and buffers are shared across all tasks. The Indexer serves queries from a single endpoint shared by all tasks.
This shared query infrastructure can improve overall efficiency but means that query load from one task can affect others.
Server HTTP Threads
The Indexer maintains two equally sized pools of HTTP threads:Chat handler thread pool
Exclusively used for task control messages between the Overlord and the Indexer
Configures the size of each pool. For example, if set to 10, there will be:
- 10 chat handler threads
- 10 non-chat handler threads
- 2 additional threads for lookup handling (if lookups are used)
Memory Sharing
The Indexer uses a global heap limit across all tasks, which is then divided among individual tasks.Imposes a global heap limit across all tasks running in the Indexer.Default: 1/6th of the available JVM heapThis global limit is evenly divided across the number of task slots configured by
druid.worker.capacity.Understanding Peak Memory Usage
The peak usage for rows held in heap memory relates to the interaction betweenmaxBytesInMemory and maxPendingPersists:
Ingestion fills buffer
When the amount of row data held in-heap by a task reaches the limit specified by
maxBytesInMemory, the task will persist the in-heap row data.Concurrent ingestion continues
After the persist has been started, the task can again ingest up to
maxBytesInMemory bytes worth of row data while the persist is running.The remaining portion of the heap is reserved for:
- Query processing
- Segment persist/merge operations
- Miscellaneous heap usage
Concurrent Segment Persist/Merge Limits
To help reduce peak memory usage, the Indexer imposes a limit on the number of concurrent segment persist/merge operations across all running tasks.Limits the number of concurrent persist/merge operations.Default:
(druid.worker.capacity / 2), rounded downCurrent Limitations
Future Enhancements
In later releases, per-task memory usage will be dynamically managed. Please see the GitHub issue #7900 for details on future enhancements to the Indexer.Planned improvements include:
- Dynamic per-task memory allocation
- Removal of fixed merge limits
- Support for separate task logs
- Better resource isolation options
Architecture Comparison
Migration Considerations
Performance Considerations
Pros
- Lower per-task overhead
- Better resource utilization
- Faster task startup (no JVM initialization)
- Easier to configure
- Better for many small tasks
Cons
- Less isolation between tasks
- Single point of failure (all tasks in one JVM)
- Harder to debug individual tasks
- Memory limits are uniform
- Still experimental