Skip to main content
Apache Druid stores its data and indexes in segment files partitioned by time. Druid creates a segment for each segment interval that contains data. If an interval is empty (containing no rows), no segment exists for that time interval. Druid may create multiple segments for the same interval if you ingest data for that period via different ingestion jobs. Compaction is the Druid process that attempts to combine these segments into a single segment per interval for optimal performance.

Optimal Segment Size

For Druid to operate well under heavy query load, it is important for the segment file size to be within the recommended range of 300-700 MB.
If your segment files are larger than this range, consider either changing the granularity of the segment time interval or partitioning your data and/or adjusting the targetRowsPerSegment in your partitionsSpec. A good starting point for this parameter is 5 million rows.
The time interval is configurable in the segmentGranularity parameter of the granularitySpec.

Segment File Structure

Segment files are columnar: the data for each column is laid out in separate data structures. By storing each column separately, Druid decreases query latency by scanning only those columns actually needed for a query.

Column Types

There are three basic column types:

Timestamp

Stores the time dimension of your data

Dimensions

Support filtering and group-by operations

Metrics

Numeric values for aggregation
Timestamp and metrics type columns are arrays of integer or floating point values compressed with LZ4. Once a query identifies which rows to select, it decompresses them, pulls out the relevant rows, and applies the desired aggregation operator. If a query doesn’t require a column, Druid skips over that column’s data.

Dimension Column Data Structures

Dimension columns support filter and group-by operations, so each dimension requires three data structures:
1

Dictionary

Maps values (always treated as strings) to integer IDs, allowing compact representation of the list and bitmap values
2

List

The column’s values, encoded using the dictionary. Required for GroupBy and TopN queries
3

Bitmap

One bitmap for each distinct value in the column, indicating which rows contain that value. Also known as inverted indexes

Example: Page Column Data Structures

1: Dictionary
   {
    "Justin Bieber": 0,
    "Ke$ha":         1
   }

2: List of column data
   [0,
   0,
   1,
   1]

3: Bitmaps
   value="Justin Bieber": [1,1,0,0]
   value="Ke$ha":         [0,0,1,1]
The bitmap is different from the dictionary and list: the dictionary and list grow linearly with the size of the data, but the size of the bitmap section is the product of data size and column cardinality. There is one bitmap per separate column value.
For each row in the list of column data, there is only a single bitmap that has a non-zero entry. This means that high cardinality columns have extremely sparse, and therefore highly compressible, bitmaps. Druid exploits this using Roaring bitmap compression.

Handling Null Values

String columns always store the null value if present in any row as id 0, the first position in the value dictionary and an associated entry in the bitmap value indexes. Numeric columns also store a null value bitmap index to indicate the null valued rows, which is used to null check aggregations and for filter matching null values.

Segments with Different Schemas

Druid segments for the same datasource may have different schemas. If a string column (dimension) exists in one segment but not another, queries that involve both segments still work:
  • Default mode: Queries for the segment without the dimension behave as if the dimension contains only blank values
  • SQL-compatible mode: Queries for the segment without the dimension behave as if the dimension contains only null values
Similarly, if one segment has a numeric column (metric) but another does not, queries on the segment without the metric generally operate as expected.

Multi-Value Columns

A multi-value column allows a single row to contain multiple strings for a column. You can think of it as an array of strings.
1: Dictionary
   {
    "Justin Bieber": 0,
    "Ke$ha":         1
   }

2: List of column data
   [0,
   [0,1],  // Row value in a multi-value column can contain an array of values
   1,
   1]

3: Bitmaps
   value="Justin Bieber": [1,1,0,0]
   value="Ke$ha":         [0,1,1,1]
                            ^
                            |
   Multi-value column contains multiple non-zero entries
If a row has more than one value for a column, its entry in the list is an array of values. Additionally, a row with n values in the list has n non-zero valued entries in bitmaps.

Compression

Druid uses LZ4 by default to compress blocks of values for string, long, float, and double columns. Druid uses Roaring to compress bitmaps for string columns and numeric null values.
We recommend that you use these defaults unless you’ve experimented with your data and query patterns suggest that non-default options will perform better in your specific case.
Druid also supports Concise bitmap compression. For string column bitmaps, the differences between using Roaring and Concise are most pronounced for high cardinality columns. In this case, Roaring is substantially faster on filters that match many values.

Segment Identification

Segment identifiers typically contain:
  • Segment datasource
  • Interval start time (ISO 8601 format)
  • Interval end time (ISO 8601 format)
  • Version information
  • Partition number (if sharded beyond time range)
Format: datasource_intervalStart_intervalEnd_version_partitionNum

Segment ID Examples

Multiple segments for the same interval:
foo_2015-01-01/2015-01-02_v1_0
foo_2015-01-01/2015-01-02_v1_1
foo_2015-01-01/2015-01-02_v1_2
Reindexed data with a new version:
foo_2015-01-01/2015-01-02_v2_0
foo_2015-01-01/2015-01-02_v2_1
foo_2015-01-01/2015-01-02_v2_2

Sharding

Multiple segments can exist for a single time interval and datasource. These segments form a block for an interval.
Depending on the type of shardSpec used to shard the data, Druid queries may only complete if a block is complete. For example, if a block consists of three segments, all three segments must load before a query for that interval can complete.Exception: Linear shard specs do not enforce “completeness” so queries can complete even if shards are not completely loaded.

Segment Components

A segment contains several files:
4 bytes representing the current segment version as an integer. For example, for v9 segments the version is 0x0, 0x0, 0x0, 0x9.
A file containing metadata (filenames and offsets) about the contents of the other smoosh files.
Smoosh (.smoosh) files contain concatenated binary data. This file consolidation reduces the number of file descriptors that must be open when accessing data. The files are 2 GB or less in size to remain within the limit of a memory-mapped ByteBuffer in Java.Smoosh files contain:
  • Individual files for each column in the data, including one for the __time column
  • An index.drd file that contains additional segment metadata
In the codebase, segments have an internal format version. The current segment format version is v9.

Implications of Updating Segments

Druid uses versioning to manage updates to create a form of multi-version concurrency control (MVCC).
Updates that span multiple segment intervals are only atomic within each interval. They are not atomic across the entire update.
When segments are updated:
  1. v2 segments are loaded into the cluster as soon as they are built
  2. They replace v1 segments for the period of time the segments overlap
  3. Before v2 segments are completely loaded, the cluster may contain a mixture of v1 and v2 segments
  4. Queries may hit a mixture of v1 and v2 segments during the transition
Example transition state:
foo_2015-01-01/2015-01-02_v1_0
foo_2015-01-02/2015-01-03_v2_1  // Updated
foo_2015-01-03/2015-01-04_v1_2

Build docs developers (and LLMs) love