Architecture - Apache Druid

Druid has a distributed architecture that is designed to be cloud-friendly and easy to operate. You can configure and scale services independently for maximum flexibility over cluster operations. This design includes enhanced fault tolerance: an outage of one component does not immediately affect other components.

Architecture overview

The following diagram shows the services that make up the Druid architecture, their typical arrangement across servers, and how queries and data flow through this architecture.

Druid services

Druid has several types of services, each responsible for specific aspects of data processing and query execution:

Coordinator

Manages data availability on the cluster

Overlord

Controls the assignment of data ingestion workloads

Broker

Handles queries from external clients

Router

Routes requests to Brokers, Coordinators, and Overlords

Historical

Stores queryable data

Middle Manager

Ingests data through Peon processes

Indexer

Alternative to Middle Manager + Peon task execution

Server types

For ease of deployment, we recommend organizing Druid services into three server types: Master, Query, and Data.

Master server

A Master server manages data ingestion and availability. It is responsible for starting new ingestion jobs and coordinating availability of data on the Data server. Master servers divide operations between Coordinator and Overlord services.

Coordinator service
Overlord service

The Coordinator service watches over the Historical services on the Data servers. It is responsible for:

Segment assignment: Assigning segments to specific servers
Load balancing: Ensuring segments are well-balanced across Historicals
Replication: Managing segment replication across multiple nodes
Segment lifecycle: Loading new segments and dropping outdated ones

The Coordinator runs its duties periodically. On each run, it assesses the current state of the cluster before deciding on the appropriate actions to take. It maintains connections to ZooKeeper for current cluster information and to the metadata database containing information about used segments and loading rules.

How segment assignment works

Before any unassigned segments are serviced by Historical services, the Historical services for each tier are first sorted in terms of capacity, with least capacity servers having the highest priority. Unassigned segments are always assigned to the services with least capacity to maintain a level of balance between services.The Coordinator does not directly communicate with a Historical service when assigning it a new segment. Instead, the Coordinator creates temporary information about the new segment under the load queue path of the Historical service in ZooKeeper. Once this request is seen, the Historical service loads the segment and begins servicing it.

Balancing strategies

The Coordinator employs different strategies to balance segments across Historicals:

cost (default): Picks the server with the minimum “cost” of placing a segment. The cost is a function of the data interval of the segment and the data intervals of all segments already present on the candidate server. This strategy tries to avoid placing segments with adjacent or overlapping data intervals on the same server.
diskNormalized: A derivative of the cost strategy that weights the cost with the disk usage ratio of the server
random: Distributes segments randomly across servers (experimental)

All strategies prioritize moving segments from the Historical with the least available disk space.

Automatic compaction

The Coordinator manages the automatic compaction system. Each run, the Coordinator compacts segments by merging small segments or splitting large ones. This is useful when the size of your segments is not optimized, which may degrade query performance.

You can run the Coordinator and Overlord services as a single combined service by setting the druid.coordinator.asOverlord.enabled property.

Query server

A Query server provides the endpoints that users and client applications interact with, routing queries to Data servers or other Query servers (and optionally proxied Master server requests). Query servers divide operations between Broker and Router services.

Broker service
Router service

The Broker service receives queries from external clients and forwards those queries to Data servers. When Brokers receive results from those subqueries, they merge those results and return them to the caller.

Query routing

To determine which services to forward queries to, the Broker service first builds a view of the world from information in ZooKeeper. ZooKeeper maintains information about Historical and streaming ingestion Peon services and the segments they are serving.For every datasource in ZooKeeper, the Broker service builds a timeline of segments and the services that serve them. When queries are received for a specific datasource and interval, the Broker service performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the services that contain data for the query.

Query received

Broker receives a query for a specific datasource and time interval

Segment lookup

Broker maps the query to a set of segments covering the requested interval

Cache check

Broker checks if segment results exist in the cache

Forward to Historicals

For uncached segments, Broker forwards the query to Historical services

Merge results

Broker merges results from cache and Historical services

Return to client

Broker returns the final merged result to the caller

Caching

Broker services employ a cache with an LRU cache invalidation strategy. The Broker cache stores per-segment results. The cache can be local to each Broker service or shared across multiple services using an external distributed cache such as memcached.

Real-time segments are never cached. Requests for real-time data will always be forwarded to real-time services because real-time data is perpetually changing and caching the results would be unreliable.

Typically, you query Brokers rather than querying Historical or Middle Manager services on Data servers directly.

Data server

A Data server executes ingestion jobs and stores queryable data. Data servers divide operations between Historical and Middle Manager services.

Historical service
Middle Manager service
Indexer service (optional)

The Historical service handles storage and querying on historical data, including any streaming data that has been in the system long enough to be committed. Historical services download segments from deep storage and respond to queries about these segments. They don’t accept writes.

Loading and serving segments

Each Historical service copies or pulls segment files from deep storage to local disk in an area called the segment cache. The Coordinator controls the assignment of segments to Historicals and the balance of segments between Historicals.Historical services do not communicate directly with each other, nor do they communicate directly with the Coordinator. Instead, the Coordinator creates ephemeral entries in ZooKeeper in a load queue path. Each Historical service maintains a connection to ZooKeeper, watching those paths for segment information.

Detect new segment

Historical detects a new entry in the ZooKeeper load queue

Check cache

Historical checks its own segment cache for the segment

Retrieve metadata

If not cached, Historical retrieves metadata from ZooKeeper about the segment, including where it’s located in deep storage

Download segment

Historical pulls down and processes the segment from deep storage

Announce availability

Historical advertises the segment as available for queries via ZooKeeper

Memory-mapped caching

The segment cache uses memory mapping. The cache consumes memory from the underlying operating system so Historicals can hold parts of segment files in memory to increase query performance at the data level.At query time:

If the required part of a segment file is available in the memory mapped cache (“page cache”), the Historical reads it directly from memory
If it is not in the memory-mapped cache, the Historical reads that part of the segment from disk

This memory-mapped segment cache is in addition to other query-level caches. To make data available for querying as soon as possible, Historical services search the local segment cache upon startup and advertise the segments found there.

Service colocation

Colocating Druid services by server type generally results in better utilization of hardware resources for most clusters. For very large scale clusters, it can be desirable to split the Druid services such that they run on individual servers to avoid resource contention.

Coordinators and Overlords

The workload on the Coordinator service tends to increase with the number of segments in the cluster. The Overlord’s workload also increases based on the number of segments in the cluster, but to a lesser degree than the Coordinator.

In clusters with very high segment counts, it can make sense to separate the Coordinator and Overlord services to provide more resources for the Coordinator’s segment balancing workload.

Historicals and Middle Managers

With higher levels of ingestion or query load, it can make sense to deploy the Historical and Middle Manager services on separate hosts to avoid CPU and memory contention. The Historical service also benefits from having free memory for memory mapped segments, which can be another reason to deploy the Historical and Middle Manager services separately.

External dependencies

In addition to its built-in service types, Druid also has three external dependencies. These are intended to be able to leverage existing infrastructure, where present.

Deep storage

Druid uses deep storage to store any data that has been ingested into the system. Deep storage is shared file storage accessible by every Druid server. In a clustered deployment, this is typically a distributed object store like S3 or HDFS, or a network mounted filesystem. In a single-server deployment, this is typically local disk.

Purpose
Performance considerations
Sizing

Druid uses deep storage for the following purposes:

Data persistence: To store all the data you ingest. Segments that get loaded onto Historical services for low latency queries are also kept in deep storage for backup purposes.
Query from deep storage: Segments that are only in deep storage can be used for queries from deep storage
Data transfer: As a way to transfer data in the background between Druid services

Deep storage is an important part of Druid’s elastic, fault-tolerant design. Druid bootstraps from deep storage even if every single data server is lost and re-provisioned.

Metadata storage

The metadata storage holds various shared system metadata such as segment usage information and task information. In a clustered deployment, this is typically a traditional RDBMS like PostgreSQL or MySQL. In a single-server deployment, it is typically a locally-stored Apache Derby database.

ZooKeeper

ZooKeeper is used for internal service discovery, coordination, and leader election. It maintains critical information about:

Historical and Peon services
Segment distribution and availability
Service health and status
Coordination between services

Learn more

See the following topics for more information:

Storage components

Learn about data storage in Druid

Segments

Learn about segment files

Query processing

High-level overview of how Druid processes queries

Web console

Learn about the Druid web console UI

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Architecture overview

​Druid services

Coordinator

Overlord

Broker

Router

Historical

Middle Manager

Indexer

​Server types

​Master server

​How segment assignment works

​Balancing strategies

​Automatic compaction

​Task management features

​Query server

​Query routing

​Caching

​Data server

​Loading and serving segments

​Memory-mapped caching

​Peon service

​Service colocation

​Coordinators and Overlords

​Historicals and Middle Managers

​External dependencies

​Deep storage

​Metadata storage

​ZooKeeper

​Learn more

Storage components

Segments

Query processing

Web console

Build docs developers (and LLMs) love

Architecture overview

Druid services

Server types

Master server

How segment assignment works

Balancing strategies

Automatic compaction

Task management features

Query server

Query routing

Caching

Data server

Loading and serving segments

Memory-mapped caching

Peon service

Service colocation

Coordinators and Overlords

Historicals and Middle Managers

External dependencies

Deep storage

Metadata storage

ZooKeeper

Learn more