- A Master server to host the Coordinator and Overlord processes
- Two scalable, fault-tolerant Data servers running Historical and Middle Manager processes
- A query server, hosting the Druid Broker and Router processes
In production, we recommend deploying multiple Master servers and multiple Query servers in a fault-tolerant configuration based on your specific fault-tolerance needs, but you can get started quickly with one Master and one Query server and add more servers later.
Select hardware
Fresh Deployment
If you do not have an existing Druid cluster, and wish to start running Druid in a clustered deployment, this guide provides an example clustered deployment with pre-made configurations.Master server
The Coordinator and Overlord processes are responsible for handling the metadata and coordination needs of your cluster. They can be colocated together on the same server. In this example, we will be deploying the equivalent of one AWS m5.2xlarge instance. This hardware offers:- 8 vCPUs
- 32 GiB RAM
conf/druid/cluster/master.
Data server
Historicals and Middle Managers can be colocated on the same server to handle the actual data in your cluster. These servers benefit greatly from CPU, RAM, and SSDs. In this example, we will be deploying the equivalent of two AWS i3.4xlarge instances. This hardware offers:- 16 vCPUs
- 122 GiB RAM
- 2 * 1.9TB SSD storage
conf/druid/cluster/data.
Query server
Druid Brokers accept queries and farm them out to the rest of the cluster. They also optionally maintain an in-memory query cache. These servers benefit greatly from CPU and RAM. In this example, we will be deploying the equivalent of one AWS m5.2xlarge instance. This hardware offers:- 8 vCPUs
- 32 GiB RAM
conf/druid/cluster/query.
Other Hardware Sizes
The example cluster above is chosen as a single example out of many possible ways to size a Druid cluster. You can choose smaller/larger hardware or less/more servers for your specific needs and constraints. If your use case has complex scaling requirements, you can also choose to not co-locate Druid processes (e.g., standalone Historical servers).Migrating from a single-server deployment
If you have an existing single-server deployment, such as the ones from the single-server deployment examples, and you wish to migrate to a clustered deployment of similar scale, the following section contains guidelines for choosing equivalent hardware using the Master/Data/Query server organization.Master server
The main considerations for the Master server are available CPUs and RAM for the Coordinator and Overlord heaps. Sum up the allocated heap sizes for your Coordinator and Overlord from the single-server deployment, and choose Master server hardware with enough RAM for the combined heaps, with some extra RAM for other processes on the machine. For CPU cores, you can choose hardware with approximately 1/4th of the cores of the single-server deployment.Data server
When choosing Data server hardware for the cluster, the main considerations are available CPUs and RAM, and using SSD storage if feasible. In a clustered deployment, having multiple Data servers is a good idea for fault-tolerance purposes. When choosing the Data server hardware, you can choose a split factorN, divide the original CPU/RAM of the single-server deployment by N, and deploy N Data servers of reduced size in the new cluster.
Instructions for adjusting the Historical/Middle Manager configs for the split are described in a later section in this guide.
Query server
The main considerations for the Query server are available CPUs and RAM for the Broker heap + direct memory, and Router heap. Sum up the allocated memory sizes for your Broker and Router from the single-server deployment, and choose Query server hardware with enough RAM to cover the Broker/Router, with some extra RAM for other processes on the machine. For CPU cores, you can choose hardware with approximately 1/4th of the cores of the single-server deployment. The basic cluster tuning guide has information on how to calculate Broker/Router memory usage.Select OS
We recommend running your favorite Linux distribution. You will also need:- Java 17
- Python 3
If needed, you can specify where to find Java using the environment variables
DRUID_JAVA_HOME or JAVA_HOME. For more details run the bin/verify-java script.Download the distribution
Download Druid
First, download and unpack the release archive. It’s best to do this on a single machine at first, since you will be editing the configurations and then copying the modified distribution out to all of your servers.Download the release.Extract Druid by running the following commands in your terminal:
Review the package contents
In the package, you should find:
LICENSEandNOTICEfilesbin/*- scripts related to the single-machine quickstartconf/druid/cluster/*- template configurations for a clustered setupextensions/*- core Druid extensionshadoop-dependencies/*- Druid Hadoop dependencieslib/*- libraries and dependencies for core Druidquickstart/*- files related to the single-machine quickstart
conf/druid/cluster/ in order to get things running.Migrating from Single-Server Deployments
In the following sections we will be editing the configs underconf/druid/cluster.If you have an existing single-server deployment, please copy your existing configs to conf/druid/cluster to preserve any config changes you have made.Configure metadata storage and deep storage
Metadata storage
Inconf/druid/cluster/_common/common.runtime.properties, replace “metadata.storage.*” with the address of the machine that you will use as your metadata store:
druid.metadata.storage.connector.connectURIdruid.metadata.storage.connector.host
Deep storage
Druid relies on a distributed filesystem or large object (blob) store for data storage. The most commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if you already have a Hadoop deployment).S3
Inconf/druid/cluster/_common/common.runtime.properties:
- Add “druid-s3-extensions” to
druid.extensions.loadList. - Comment out the configurations for local storage under “Deep Storage” and “Indexing service logs”.
- Uncomment and configure appropriate values in the “For S3” sections of “Deep Storage” and “Indexing service logs”.
HDFS
Inconf/druid/cluster/_common/common.runtime.properties:
- Add “druid-hdfs-storage” to
druid.extensions.loadList. - Comment out the configurations for local storage under “Deep Storage” and “Indexing service logs”.
- Uncomment and configure appropriate values in the “For HDFS” sections of “Deep Storage” and “Indexing service logs”.
- Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml) on the classpath of your Druid processes. You can do this by copying them into
conf/druid/cluster/_common/.
Configure for connecting to Hadoop (optional)
If you will be loading data from a Hadoop cluster, then at this point you should configure Druid to be aware of your cluster:-
Update
druid.indexer.task.hadoopWorkingPathinconf/druid/cluster/middleManager/runtime.propertiesto a path on HDFS that you’d like to use for temporary files required during the indexing process.druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexingis a common choice. -
Place your Hadoop configuration XMLs (core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml) on the classpath of your Druid processes. You can do this by copying them into
conf/druid/cluster/_common/core-site.xml,conf/druid/cluster/_common/hdfs-site.xml, and so on.
You don’t need to use HDFS deep storage in order to load data from Hadoop. For example, if your cluster is running on Amazon Web Services, we recommend using S3 for deep storage even if you are loading data using Hadoop or Elastic MapReduce.
Configure Zookeeper connection
In a production cluster, we recommend using a dedicated ZK cluster in a quorum, deployed separately from the Druid servers. Inconf/druid/cluster/_common/common.runtime.properties, set druid.zk.service.host to a connection string containing a comma separated list of host:port pairs, each corresponding to a ZooKeeper server in your ZK quorum. (e.g. “127.0.0.1:4545” or “127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002”)
Configuration Tuning
Migrating from a Single-Server Deployment
Master
If you are using an example configuration from single-server deployment examples, these examples combine the Coordinator and Overlord processes into one combined process. The example configs underconf/druid/cluster/master/coordinator-overlord also combine the Coordinator and Overlord processes.
You can copy your existing coordinator-overlord configs from the single-server deployment to conf/druid/cluster/master/coordinator-overlord.
Data
Suppose we are migrating from a single-server deployment that had 32 CPU and 256GiB RAM. In the old deployment, the following configurations for Historicals and Middle Managers were applied: Historical (Single-server):druid.processing.numThreads: Set to(num_cores - 1)based on the new hardwaredruid.processing.numMergeBuffers: Divide the old value from the single-server deployment by the split factordruid.processing.buffer.sizeBytes: Keep this unchanged
druid.worker.capacity: Divide the old value from the single-server deployment by the split factordruid.indexer.fork.property.druid.processing.numMergeBuffers: Keep this unchangeddruid.indexer.fork.property.druid.processing.buffer.sizeBytes: Keep this unchangeddruid.indexer.fork.property.druid.processing.numThreads: Keep this unchanged
Query
You can copy your existing Broker and Router configs to the directories underconf/druid/cluster/query, no modifications are needed, as long as the new hardware is sized accordingly.
Fresh deployment
If you are using the example cluster described above:- 1 Master server (m5.2xlarge)
- 2 Data servers (i3.4xlarge)
- 1 Query server (m5.2xlarge)
conf/druid/cluster have already been sized for this hardware and you do not need to make further modifications for general use cases.
If you have chosen different hardware, the basic cluster tuning guide can help you size your configurations.
Open ports (if using a firewall)
If you’re using a firewall or some other system that only allows traffic on specific ports, allow inbound connections on the following:Master Server
- 1527 (Derby metadata store; not needed if you are using a separate metadata store like MySQL or PostgreSQL)
- 2181 (ZooKeeper; not needed if you are using a separate ZooKeeper cluster)
- 8081 (Coordinator)
- 8090 (Overlord)
Data Server
- 8083 (Historical)
- 8091, 8100–8199 (Druid Middle Manager; you may need higher than port 8199 if you have a very high
druid.worker.capacity)
Query Server
- 8082 (Broker)
- 8088 (Router, if used)
In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware, rather than on the Master server.
Start Master Server
Copy Druid to the Master server
Copy the Druid distribution and your edited configurations to your Master server.If you have been editing the configurations on your local machine, you can use
rsync to copy them:Start the Master server
No Zookeeper on Master:From the distribution root, run the following command to start the Master server:With Zookeeper on Master:If you plan to run ZK on Master servers, first update
conf/zoo.cfg to reflect how you plan to run ZK. Then, you can start the Master server processes together with ZK using:In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.
Start Data Server
Copy Druid to the Data servers
Copy the Druid distribution and your edited configurations to your Data servers.
Start the Data servers
From the distribution root, run the following command to start the Data server:You can add more Data servers as needed.
For clusters with complex resource allocation needs, you can break apart Historicals and Middle Managers and scale the components individually. This also allows you take advantage of Druid’s built-in Middle Manager autoscaling facility.
Start Query Server
Copy Druid to the Query servers
Copy the Druid distribution and your edited configurations to your Query servers.