Indexer Service

The Indexer is an optional and experimental feature. If you’re primarily performing batch ingestion, we recommend you use either the MiddleManager and Peon task execution system or MiddleManager-less ingestion using Kubernetes. If you’re primarily doing streaming ingestion, you may want to try either MiddleManager-less ingestion using Kubernetes or the Indexer service.

The Apache Druid Indexer service is an alternative to the Middle Manager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process.

Key Differences from Middle Manager + Peon

Execution Model

Indexer: Tasks run as threads in a single JVMMiddle Manager: Tasks run in separate JVM processes (Peons)

Resource Sharing

Indexer: Better resource sharing across tasks (query processing, HTTP threads, memory)Middle Manager: Complete isolation between tasks

Configuration

Indexer: Easier to configure and deployMiddle Manager: More configuration parameters for Peon processes

Overhead

Indexer: Lower per-task overhead (no JVM startup, shared resources)Middle Manager: Higher overhead but better isolation

When to Use the Indexer

The Indexer is designed to be easier to configure and deploy compared to the Middle Manager + Peon system and to better enable resource sharing across tasks.

Good Use Cases
Consider Alternatives

Streaming ingestion workloads with many concurrent tasks

Resource-constrained environments where JVM overhead matters

Simplified deployments where ease of configuration is important

Workloads with predictable resource usage across tasks

Configuration

For Apache Druid Indexer service configuration, see:

Indexer Configuration

Running the Indexer

org.apache.druid.cli.Main server indexer

HTTP Endpoints

The Indexer service shares the same HTTP endpoints as the Middle Manager:

Middle Manager API reference

The following resources are shared across all tasks running inside the Indexer service:

Query Resources

The query processing threads and buffers are shared across all tasks. The Indexer serves queries from a single endpoint shared by all tasks.

If query caching is enabled, the query cache is also shared across all tasks.

This shared query infrastructure can improve overall efficiency but means that query load from one task can affect others.

Server HTTP Threads

The Indexer maintains two equally sized pools of HTTP threads:

Chat handler thread pool

Exclusively used for task control messages between the Overlord and the Indexer

General HTTP thread pool

Used for handling all other HTTP requests (queries, status, etc.)

druid.server.http.numThreads

integer

Configures the size of each pool. For example, if set to 10, there will be:

10 chat handler threads
10 non-chat handler threads
2 additional threads for lookup handling (if lookups are used)

The Indexer uses a global heap limit across all tasks, which is then divided among individual tasks.

druid.worker.globalIngestionHeapLimitBytes

bytes

Imposes a global heap limit across all tasks running in the Indexer.Default: 1/6th of the available JVM heapThis global limit is evenly divided across the number of task slots configured by druid.worker.capacity.

Important Memory BehaviorTo apply the per-task heap limit, the Indexer overrides maxBytesInMemory in task tuning configurations:

Ignores the default value
Ignores any user-configured value
Also overrides maxRowsInMemory to an essentially unlimited value (the Indexer does not support row limits)

Understanding Peak Memory Usage

The peak usage for rows held in heap memory relates to the interaction between maxBytesInMemory and maxPendingPersists:

Ingestion fills buffer

When the amount of row data held in-heap by a task reaches the limit specified by maxBytesInMemory, the task will persist the in-heap row data.

Concurrent ingestion continues

After the persist has been started, the task can again ingest up to maxBytesInMemory bytes worth of row data while the persist is running.

Peak memory calculation

This means that the peak in-heap usage for row data can be up to approximately:

maxBytesInMemory * (2 + maxPendingPersists)

The default value of maxPendingPersists is 0, which allows for 1 persist to run concurrently with ingestion work.

The remaining portion of the heap is reserved for:

Query processing
Segment persist/merge operations
Miscellaneous heap usage

Concurrent Segment Persist/Merge Limits

To help reduce peak memory usage, the Indexer imposes a limit on the number of concurrent segment persist/merge operations across all running tasks.

druid.worker.numConcurrentMerges

integer

Limits the number of concurrent persist/merge operations.Default: (druid.worker.capacity / 2), rounded down

This limit helps prevent memory spikes when multiple tasks attempt to persist or merge segments simultaneously.

Current Limitations

Known Limitations

Separate Task Logs Not Supported

Separate task logs are not currently supported when using the Indexer. All task log messages will instead be logged in the Indexer service log.This can make debugging individual tasks more challenging compared to the Middle Manager + Peon approach.

Identical Memory Limits

The Indexer currently imposes an identical memory limit on each task. This can be inefficient if you have tasks with varying memory requirements.

In later releases, the per-task memory limit will be removed and only the global limit will apply.

Fixed Merge Limits

The limit on concurrent merges applies globally across all tasks.

In later releases, this limit will be removed or made more dynamic.

Future Enhancements

In later releases, per-task memory usage will be dynamically managed. Please see the GitHub issue #7900 for details on future enhancements to the Indexer.Planned improvements include:

Dynamic per-task memory allocation
Removal of fixed merge limits
Support for separate task logs
Better resource isolation options

Architecture Comparison

Migration Considerations

If migrating from Middle Manager + Peon to Indexer:

Test thoroughly: The Indexer is experimental; test with your workload
Adjust memory settings: Configure druid.worker.globalIngestionHeapLimitBytes appropriately
Update monitoring: Task logs will be in the Indexer service log
Review task isolation: Ensure your tasks can safely share resources
Plan rollback: Keep Middle Manager configuration ready if needed

Performance Considerations

Pros

Lower per-task overhead
Better resource utilization
Faster task startup (no JVM initialization)
Easier to configure
Better for many small tasks

Cons

Less isolation between tasks
Single point of failure (all tasks in one JVM)
Harder to debug individual tasks
Memory limits are uniform
Still experimental

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

Key Differences from Middle Manager + Peon

Execution Model

Resource Sharing

Configuration

Overhead

When to Use the Indexer

Configuration

Running the Indexer

HTTP Endpoints

Query Resources

Server HTTP Threads

Understanding Peak Memory Usage

Concurrent Segment Persist/Merge Limits

Current Limitations

Future Enhancements

Architecture Comparison

Migration Considerations

Performance Considerations

Pros

Cons

Build docs developers (and LLMs) love

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Key Differences from Middle Manager + Peon

Execution Model

Resource Sharing

Configuration

Overhead

​When to Use the Indexer

​Configuration

​Running the Indexer

​HTTP Endpoints

​Task Resource Sharing

​Query Resources

​Server HTTP Threads

​Memory Sharing

​Understanding Peak Memory Usage

​Concurrent Segment Persist/Merge Limits

​Current Limitations

​Future Enhancements

​Architecture Comparison

​Migration Considerations

​Performance Considerations

Pros

Cons

Build docs developers (and LLMs) love

Key Differences from Middle Manager + Peon

When to Use the Indexer

Configuration

Running the Indexer

HTTP Endpoints

Task Resource Sharing

Query Resources

Server HTTP Threads

Memory Sharing

Understanding Peak Memory Usage

Concurrent Segment Persist/Merge Limits

Current Limitations

Future Enhancements

Architecture Comparison

Migration Considerations

Performance Considerations