Skip to main content
The Indexer is an optional and experimental feature. If you’re primarily performing batch ingestion, we recommend you use either the MiddleManager and Peon task execution system or MiddleManager-less ingestion using Kubernetes. If you’re primarily doing streaming ingestion, you may want to try either MiddleManager-less ingestion using Kubernetes or the Indexer service.
The Apache Druid Indexer service is an alternative to the Middle Manager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process.

Key Differences from Middle Manager + Peon

Execution Model

Indexer: Tasks run as threads in a single JVMMiddle Manager: Tasks run in separate JVM processes (Peons)

Resource Sharing

Indexer: Better resource sharing across tasks (query processing, HTTP threads, memory)Middle Manager: Complete isolation between tasks

Configuration

Indexer: Easier to configure and deployMiddle Manager: More configuration parameters for Peon processes

Overhead

Indexer: Lower per-task overhead (no JVM startup, shared resources)Middle Manager: Higher overhead but better isolation

When to Use the Indexer

The Indexer is designed to be easier to configure and deploy compared to the Middle Manager + Peon system and to better enable resource sharing across tasks.
Streaming ingestion workloads with many concurrent tasks
Resource-constrained environments where JVM overhead matters
Simplified deployments where ease of configuration is important
Workloads with predictable resource usage across tasks

Configuration

For Apache Druid Indexer service configuration, see:

Running the Indexer

org.apache.druid.cli.Main server indexer

HTTP Endpoints

The Indexer service shares the same HTTP endpoints as the Middle Manager:

Task Resource Sharing

The following resources are shared across all tasks running inside the Indexer service:

Query Resources

The query processing threads and buffers are shared across all tasks. The Indexer serves queries from a single endpoint shared by all tasks.
If query caching is enabled, the query cache is also shared across all tasks.
This shared query infrastructure can improve overall efficiency but means that query load from one task can affect others.

Server HTTP Threads

The Indexer maintains two equally sized pools of HTTP threads:
1

Chat handler thread pool

Exclusively used for task control messages between the Overlord and the Indexer
2

General HTTP thread pool

Used for handling all other HTTP requests (queries, status, etc.)
druid.server.http.numThreads
integer
Configures the size of each pool. For example, if set to 10, there will be:
  • 10 chat handler threads
  • 10 non-chat handler threads
  • 2 additional threads for lookup handling (if lookups are used)

Memory Sharing

The Indexer uses a global heap limit across all tasks, which is then divided among individual tasks.
druid.worker.globalIngestionHeapLimitBytes
bytes
Imposes a global heap limit across all tasks running in the Indexer.Default: 1/6th of the available JVM heapThis global limit is evenly divided across the number of task slots configured by druid.worker.capacity.
Important Memory BehaviorTo apply the per-task heap limit, the Indexer overrides maxBytesInMemory in task tuning configurations:
  • Ignores the default value
  • Ignores any user-configured value
  • Also overrides maxRowsInMemory to an essentially unlimited value (the Indexer does not support row limits)

Understanding Peak Memory Usage

The peak usage for rows held in heap memory relates to the interaction between maxBytesInMemory and maxPendingPersists:
1

Ingestion fills buffer

When the amount of row data held in-heap by a task reaches the limit specified by maxBytesInMemory, the task will persist the in-heap row data.
2

Concurrent ingestion continues

After the persist has been started, the task can again ingest up to maxBytesInMemory bytes worth of row data while the persist is running.
3

Peak memory calculation

This means that the peak in-heap usage for row data can be up to approximately:
maxBytesInMemory * (2 + maxPendingPersists)
The default value of maxPendingPersists is 0, which allows for 1 persist to run concurrently with ingestion work.
The remaining portion of the heap is reserved for:
  • Query processing
  • Segment persist/merge operations
  • Miscellaneous heap usage

Concurrent Segment Persist/Merge Limits

To help reduce peak memory usage, the Indexer imposes a limit on the number of concurrent segment persist/merge operations across all running tasks.
druid.worker.numConcurrentMerges
integer
Limits the number of concurrent persist/merge operations.Default: (druid.worker.capacity / 2), rounded down
This limit helps prevent memory spikes when multiple tasks attempt to persist or merge segments simultaneously.

Current Limitations

Known Limitations
Separate task logs are not currently supported when using the Indexer. All task log messages will instead be logged in the Indexer service log.This can make debugging individual tasks more challenging compared to the Middle Manager + Peon approach.
The Indexer currently imposes an identical memory limit on each task. This can be inefficient if you have tasks with varying memory requirements.
In later releases, the per-task memory limit will be removed and only the global limit will apply.
The limit on concurrent merges applies globally across all tasks.
In later releases, this limit will be removed or made more dynamic.

Future Enhancements

In later releases, per-task memory usage will be dynamically managed. Please see the GitHub issue #7900 for details on future enhancements to the Indexer.Planned improvements include:
  • Dynamic per-task memory allocation
  • Removal of fixed merge limits
  • Support for separate task logs
  • Better resource isolation options

Architecture Comparison

Migration Considerations

If migrating from Middle Manager + Peon to Indexer:
  1. Test thoroughly: The Indexer is experimental; test with your workload
  2. Adjust memory settings: Configure druid.worker.globalIngestionHeapLimitBytes appropriately
  3. Update monitoring: Task logs will be in the Indexer service log
  4. Review task isolation: Ensure your tasks can safely share resources
  5. Plan rollback: Keep Middle Manager configuration ready if needed

Performance Considerations

Pros

  • Lower per-task overhead
  • Better resource utilization
  • Faster task startup (no JVM initialization)
  • Easier to configure
  • Better for many small tasks

Cons

  • Less isolation between tasks
  • Single point of failure (all tasks in one JVM)
  • Harder to debug individual tasks
  • Memory limits are uniform
  • Still experimental

Build docs developers (and LLMs) love