Skip to main content
Druid stores data in datasources, which are similar to tables in a traditional relational database management system (RDBMS). Druid’s data model shares similarities with both relational and timeseries data models.

Primary Timestamp

Druid schemas must always include a primary timestamp. The primary timestamp is fundamental to how Druid operates:

Partitioning & Sorting

Druid uses the primary timestamp to partition and sort your data

Query Performance

Enables rapid identification and retrieval of data within time ranges

Data Management

Powers time-based operations like dropping chunks and retention rules

Storage Column

Always stored in the __time column in your datasource

Configuration

Druid parses the primary timestamp based on the timestampSpec configuration at ingestion time.
{
  "timestampSpec": {
    "column": "timestamp",
    "format": "iso"
  }
}
Regardless of the source field for the primary timestamp, Druid always stores the timestamp in the __time column.

Additional Timestamp Operations

You can control other important timestamp-based operations in the granularitySpec:
segmentGranularity
string
Controls how data is partitioned into time chunks (e.g., HOUR, DAY, WEEK, MONTH)
queryGranularity
string
Controls the timestamp resolution in query results (e.g., NONE, SECOND, MINUTE)
If you have more than one timestamp column, you can store the others as secondary timestamps.

Dimensions

Dimensions are columns that Druid stores “as-is”. You can use dimensions for any purpose at query time.

Dimension Capabilities

Filter your data based on dimension values:
SELECT * FROM datasource WHERE country = 'USA'

Dimension Types

Druid supports several dimension types:
The default and most common dimension type.
{
  "dimensionsSpec": {
    "dimensions": [
      "country",
      "city",
      "user_id"
    ]
  }
}
For integer values.
{
  "dimensionsSpec": {
    "dimensions": [
      {
        "type": "long",
        "name": "user_age"
      }
    ]
  }
}
For floating-point values.
{
  "dimensionsSpec": {
    "dimensions": [
      {
        "type": "double",
        "name": "latitude"
      }
    ]
  }
}
For single-precision floating-point values.
{
  "dimensionsSpec": {
    "dimensions": [
      {
        "type": "float",
        "name": "price"
      }
    ]
  }
}
Arrays of strings for one-to-many relationships.
{
  "dimensionsSpec": {
    "dimensions": [
      {
        "type": "string",
        "name": "tags",
        "multiValueHandling": "ARRAY"
      }
    ]
  }
}

Rollup and Dimensions

If you disable rollup, Druid treats the set of dimensions like a set of columns to ingest. The dimensions behave exactly as you would expect from any database that does not support a rollup feature.
At ingestion time, you configure dimensions in the dimensionsSpec:
{
  "dimensionsSpec": {
    "useSchemaDiscovery": true,
    "dimensionExclusions": [
      "timestamp",
      "value"
    ]
  }
}

Metrics

Metrics are columns that Druid stores in an aggregated form. Metrics are most useful when you enable rollup.

When to Use Metrics

Rollup

Collapse dimensions while aggregating metric values to reduce row count

Performance

Pre-compute aggregations at ingestion time for faster queries

How Metrics Work with Rollup

When you enable rollup, Druid combines multiple rows with the same timestamp and dimension values:
1

Group Rows

Rows with identical timestamp and dimension values are grouped together
2

Aggregate Metrics

Metric aggregation functions are applied to each group
3

Store Single Row

The grouped data is stored as a single row with aggregated metrics

Common Metric Types

Count the number of rows:
{
  "type": "count",
  "name": "count"
}

Example: NetFlow Data with Rollup

[
  {"timestamp": "2024-01-01T00:00:00Z", "srcIP": "192.168.1.1", "dstIP": "10.0.0.1", "packets": 100, "bytes": 5000},
  {"timestamp": "2024-01-01T00:00:00Z", "srcIP": "192.168.1.1", "dstIP": "10.0.0.1", "packets": 150, "bytes": 7500},
  {"timestamp": "2024-01-01T00:00:00Z", "srcIP": "192.168.1.1", "dstIP": "10.0.0.1", "packets": 200, "bytes": 10000}
]

Metrics Configuration

At ingestion time, you configure metrics in the metricsSpec:
{
  "metricsSpec": [
    {
      "type": "count",
      "name": "count"
    },
    {
      "type": "longSum",
      "name": "total_packets",
      "fieldName": "packets"
    },
    {
      "type": "longSum",
      "name": "total_bytes",
      "fieldName": "bytes"
    }
  ]
}
Metrics can only be aggregated at query time (not filtered or grouped by). If you need to filter or group by a column, define it as a dimension instead.

Benefits of Pre-Aggregation

Even without rollup, pre-aggregating metrics at ingestion time can provide benefits:
Some aggregators, especially approximate ones like HyperLogLog and Theta sketches, compute faster at query time if they are partially computed at ingestion time.
Approximate aggregators often use less storage than the raw data they represent.
You can still compute additional aggregations at query time on pre-aggregated data.

Schema Configuration Example

{
  "dataSchema": {
    "dataSource": "web_traffic",
    "timestampSpec": {
      "column": "timestamp",
      "format": "iso"
    },
    "dimensionsSpec": {
      "dimensions": [
        "page",
        "country",
        "device_type",
        {
          "type": "long",
          "name": "user_id"
        }
      ]
    },
    "metricsSpec": [
      {
        "type": "count",
        "name": "count"
      },
      {
        "type": "longSum",
        "name": "page_views",
        "fieldName": "views"
      },
      {
        "type": "doubleSum",
        "name": "session_duration",
        "fieldName": "duration"
      },
      {
        "type": "hyperUnique",
        "name": "unique_users",
        "fieldName": "user_id"
      }
    ],
    "granularitySpec": {
      "segmentGranularity": "DAY",
      "queryGranularity": "MINUTE",
      "rollup": true
    }
  }
}

Next Steps

Schema Design

Learn best practices for designing your schema

Data Rollup

Understand rollup in detail

Partitioning

Learn about partitioning strategies

Ingestion Spec

Complete ingestion specification reference

Build docs developers (and LLMs) love