Broker Service

The Broker service routes queries in a distributed cluster setup. It interprets the metadata published to ZooKeeper about segment distribution across services and routes queries accordingly. Additionally, the Broker service consolidates result sets from individual services.

Key Responsibilities

Query Routing

Determines which Historical and real-time services to forward queries to based on segment distribution

Result Consolidation

Merges and consolidates results from multiple services into a single response

Cache Management

Manages per-segment result caching with LRU invalidation strategy

Metadata Interpretation

Reads segment distribution information from ZooKeeper to build query plans

Configuration

For Apache Druid Broker service configuration, see the configuration reference:

Running the Broker

org.apache.druid.cli.Main server broker

Forwarding Queries

Most Druid queries contain an interval object that indicates a span of time for which data is requested. Similarly, Druid partitions segments to contain data for some interval of time and distributes the segments across a cluster.

How Query Forwarding Works

Build view from ZooKeeper

The Broker service first builds a view of the world from information in ZooKeeper. ZooKeeper maintains information about Historical and streaming ingestion Peon services and the segments they are serving.

Build segment timeline

For every datasource in ZooKeeper, the Broker service builds a timeline of segments and the services that serve them.

Perform timeline lookup

When queries are received for a specific datasource and interval, the Broker service performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the services that contain data for the query.

Forward query

The Broker service then forwards down the query to the selected services.

Example Scenario

Consider a simple datasource with seven segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple services, and hence, the query will likely hit multiple services.

Caching

Broker services employ a cache with an LRU (Least Recently Used) cache invalidation strategy. The Broker cache stores per-segment results.

Cache Architecture

Local Cache
Distributed Cache

The cache can be local to each Broker service, providing fast access with no network overhead.

Cache Behavior

Each time a Broker service receives a query:

Map query to segments

The Broker maps the query to a set of segments that need to be queried.

Check cache for existing results

A subset of these segment results may already exist in the cache and the results can be directly pulled from the cache.

Forward uncached queries

For any segment results that do not exist in the cache, the Broker service will forward the query to the Historical services.

Store new results in cache

Once the Historical services return their results, the Broker will store those results in the cache for future use.

Real-time segments are never cached and hence requests for real-time data will always be forwarded to real-time services. Real-time data is perpetually changing and caching the results would be unreliable.

HTTP Endpoints

For a list of API endpoints supported by the Broker, see:

Broker API Reference

Architecture Integration

The Broker service integrates with other Druid components:

With ZooKeeper

Maintains connection to ZooKeeper cluster for current cluster information
Reads segment distribution metadata
Monitors Historical and Peon service availability

With Historical Services

Forwards queries to Historical services that hold relevant segments
Receives and consolidates query results
Does not manage segment loading/unloading (that’s the Coordinator’s job)

With Real-time Services

Forwards queries for real-time data to streaming ingestion tasks
Never caches real-time results due to data mutability

Performance Considerations

To optimize Broker performance:

Scale horizontally: Add more Broker instances to handle increased query load
Configure cache appropriately: Size local or distributed cache based on your query patterns
Monitor query distribution: Ensure queries are evenly distributed across Historical services
Tune connection pools: Adjust the number of connections to Historical services based on cluster size

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

Key Responsibilities

Query Routing

Result Consolidation

Cache Management

Metadata Interpretation

Configuration

Running the Broker

Forwarding Queries

How Query Forwarding Works

Example Scenario

Caching

Cache Architecture

Cache Behavior

HTTP Endpoints

Architecture Integration

With ZooKeeper

With Historical Services

With Real-time Services

Performance Considerations

Build docs developers (and LLMs) love

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Key Responsibilities

Query Routing

Result Consolidation

Cache Management

Metadata Interpretation

​Configuration

​Running the Broker

​Forwarding Queries

​How Query Forwarding Works

​Example Scenario

​Caching

​Cache Architecture

​Cache Behavior

​HTTP Endpoints

​Architecture Integration

With ZooKeeper

With Historical Services

With Real-time Services

​Performance Considerations

Build docs developers (and LLMs) love

Key Responsibilities

Configuration

Running the Broker

Forwarding Queries

How Query Forwarding Works

Example Scenario

Caching

Cache Architecture

Cache Behavior

HTTP Endpoints

Architecture Integration

Performance Considerations