Data Retention Rules - Apache Druid

Data retention rules allow you to configure Apache Druid to conform to your data retention policies. Retention rules specify which data to retain and which data to drop from the cluster.

Overview

Retention rules control the lifecycle of segments in your Druid cluster:

Load rules: Define which segments to keep on Historical servers and how many replicas
Drop rules: Mark segments as unused based on time periods or intervals
Broadcast rules: Load segments onto Broker nodes (for testing only)

Retention rules are persistent and stored in Druid’s metadata store. They remain in effect until you change them.

Rule Types

You can specify data retention in three ways:

Forever: All data in the segment
Period: Segment data specified as an offset from the present time
Interval: A fixed time range

Setting Retention Rules

Using the Web Console

Navigate to Datasources

Click Datasources in the top-level navigation of the web console.

Edit Retention Rules

Click the datasource name, then select Actions > Edit retention rules.

Create New Rule

Click +New rule and select a rule type.

Configure Rule Properties

Set properties for the rule based on the rule type you selected.

Save and Apply

Click Next, enter a description, and click Save to apply the rule.

Using the Coordinator API

Set default rules for all datasources:

curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/rules/_default' \
--header 'Content-Type: application/json' \
--data-raw '[{
  "type": "loadForever",
  "tieredReplicants": {
    "_default_tier": 2
  }
}]'

Set rules for a specific datasource:

curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/rules/wikipedia' \
--header 'Content-Type: application/json' \
--data-raw '[{
  "type": "loadByPeriod",
  "period": "P30D",
  "tieredReplicants": {
    "hot": 2,
    "_default_tier": 1
  }
},
{
  "type": "dropBeforeByPeriod",
  "period": "P90D"
}]'

You must pass the entire array of rules with each API request. Each POST request overwrites existing rules for the specified datasource.

Rule Structure and Order

Rule order is critical. The Coordinator:

Reads rules in the order they appear
Cycles through all used segments
Matches each segment with the first applicable rule
Each segment can only match a single rule

Example rule evaluation order:

Rule 1: Load last 7 days to hot tier
Rule 2: Load last 30 days to default tier  
Rule 3: Drop everything else

In the web console, use the up and down arrows to reorder rules.

Load Rules

Load rules define how Druid assigns segments to Historical process tiers and set replica counts.

Forever Load Rule

Assigns all datasource segments to specified tiers:

{
  "type": "loadForever",
  "tieredReplicants": {
    "hot": 1,
    "_default_tier": 1
  }
}

Properties:

tieredReplicants: Map of tier names to number of replicas (0 or positive integer)
useDefaultTierForNull: Determines default value if tieredReplicants is null (default: true)

Period Load Rule

Assigns segment data in a specific period to a tier:

{
  "type": "loadByPeriod",
  "period": "P1M",
  "includeFuture": true,
  "tieredReplicants": {
    "hot": 1,
    "_default_tier": 1
  }
}

Properties:

period: ISO 8601 period from past to present (or future if includeFuture is true)
includeFuture: Match segments that start after the rule interval starts (default: true)
tieredReplicants: Map of tier names to replica counts

Interval Load Rule

Assigns a specific time range to a tier:

{
  "type": "loadByInterval",
  "interval": "2023-01-01/2023-12-31",
  "tieredReplicants": {
    "hot": 1,
    "_default_tier": 1
  }
}

Properties:

interval: ISO 8601 time range encoded as a string
tieredReplicants: Map of tier names to replica counts

Query from Deep Storage

Configure segments to be queryable from deep storage without loading to Historicals:

{
  "type": "loadByPeriod",
  "period": "P90D",
  "tieredReplicants": {},
  "useDefaultTierForNull": false
}

Setting tieredReplicants to an empty object and useDefaultTierForNull to false allows queries from deep storage without Historical tier loading.

Drop Rules

Drop rules mark segments as unused, removing them from the cluster. Data remains in deep storage unless you run a kill task.

If you use a load rule to retain only recent data, you must also define a drop rule. Otherwise, Druid retains older data according to the default loadForever rule.

Forever Drop Rule

Drops all segment data from the cluster:

{
  "type": "dropForever"
}

Usually the last rule in a set to drop any remaining segments.

Period Drop Rule

Drops segments within a specific period (drops recent data):

{
  "type": "dropByPeriod",
  "period": "P7D",
  "includeFuture": true
}

Properties:

period: ISO 8601 period from past to present/future
includeFuture: Match segments starting after the rule interval (default: true)

Period Drop Before Rule

Drops segments before a specific period (drops old data):

{
  "type": "dropBeforeByPeriod",
  "period": "P90D"
}

Properties:

period: ISO 8601 period

The rule combination dropBeforeByPeriod + loadForever is equivalent to loadByPeriod(includeFuture = true) + dropForever.

Interval Drop Rule

Drops segments in a specific time range:

{
  "type": "dropByInterval",
  "interval": "2020-01-01/2021-01-01"
}

Properties:

interval: ISO 8601 time range

Broadcast Rules

Broadcast rules load segments onto all Brokers in the cluster. Use only in test environments, not production.

Requires druid.segmentCache.locations configured on both Brokers and Historicals.

Forever Broadcast Rule

{
  "type": "broadcastForever"
}

Period Broadcast Rule

{
  "type": "broadcastByPeriod",
  "period": "P1M",
  "includeFuture": true
}

Interval Broadcast Rule

{
  "type": "broadcastByInterval",
  "interval": "2023-01-01/2024-01-01"
}

Common Retention Patterns

Hot-Warm-Cold Architecture

Keep recent data hot, older data warm, archive oldest:

[
  {
    "type": "loadByPeriod",
    "period": "P7D",
    "tieredReplicants": {"hot": 2}
  },
  {
    "type": "loadByPeriod",
    "period": "P30D",
    "tieredReplicants": {"warm": 1}
  },
  {
    "type": "loadByPeriod",
    "period": "P90D",
    "tieredReplicants": {},
    "useDefaultTierForNull": false
  },
  {
    "type": "dropForever"
  }
]

Retain Last N Days

Keep only the last 30 days of data:

[
  {
    "type": "loadByPeriod",
    "period": "P30D",
    "tieredReplicants": {"_default_tier": 2}
  },
  {
    "type": "dropForever"
  }
]

High Availability for Recent Data

More replicas for recent data:

[
  {
    "type": "loadByPeriod",
    "period": "P7D",
    "tieredReplicants": {"_default_tier": 3}
  },
  {
    "type": "loadByPeriod",
    "period": "P90D",
    "tieredReplicants": {"_default_tier": 2}
  },
  {
    "type": "dropForever"
  }
]

Managing Dropped Data

Permanently Delete Data

Dropped segments remain in deep storage. To permanently delete:

Segments are marked “unused” via drop rules or manual action
Submit a kill task to delete from deep storage
Or enable auto-kill on the Coordinator

See Data Deletion for details.

Reload Dropped Data

Update Retention Period

Change the retention period (e.g., from 30 days to 60 days).

Mark Segments as Used

Use the web console or API to mark all segments for the datasource as “used”.

Coordinator Reloads Data

The Coordinator reruns rules and loads missing segments automatically.

Viewing Retention Rules

Retrieve all rules:

curl 'http://localhost:8081/druid/coordinator/v1/rules'

Retrieve rules for a specific datasource:

curl 'http://localhost:8081/druid/coordinator/v1/rules/wikipedia?full=true'

View audit history:

curl 'http://localhost:8081/druid/coordinator/v1/rules/history?interval=2024-01-01/2024-02-01'

Best Practices

Set Default Rules

Configure default rules to prevent unlimited data retention across all datasources.

Use Period Rules

Prefer period-based rules over interval-based for dynamic retention that adapts as time progresses.

Test Rule Order

Verify rule order carefully - segments match the first applicable rule only.

Enable Auto-Kill

Configure auto-kill to automatically clean up unused segments from deep storage.

Learn More

Retention Tutorial

Step-by-step guide to configuring retention rules

Data Deletion

Permanently delete data with kill tasks

Retention Rules API

Complete API reference for managing rules

Mixed Workloads

Configure tiering for different workload types

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Overview

​Rule Types

​Setting Retention Rules

​Using the Web Console

​Using the Coordinator API

​Rule Structure and Order

​Load Rules

​Forever Load Rule

​Period Load Rule

​Interval Load Rule

​Query from Deep Storage

​Drop Rules

​Forever Drop Rule

​Period Drop Rule

​Period Drop Before Rule

​Interval Drop Rule

​Broadcast Rules

​Forever Broadcast Rule

​Period Broadcast Rule

​Interval Broadcast Rule

​Common Retention Patterns

​Hot-Warm-Cold Architecture

​Retain Last N Days

​High Availability for Recent Data

​Managing Dropped Data

​Permanently Delete Data

​Reload Dropped Data

​Viewing Retention Rules

​Best Practices

Set Default Rules

Use Period Rules

Test Rule Order

Enable Auto-Kill

​Learn More

Retention Tutorial

Data Deletion

Retention Rules API

Mixed Workloads

Build docs developers (and LLMs) love

Overview

Rule Types

Setting Retention Rules

Using the Web Console

Using the Coordinator API

Rule Structure and Order

Load Rules

Forever Load Rule

Period Load Rule

Interval Load Rule

Query from Deep Storage

Drop Rules

Forever Drop Rule

Period Drop Rule

Period Drop Before Rule

Interval Drop Rule

Broadcast Rules

Forever Broadcast Rule

Period Broadcast Rule

Interval Broadcast Rule

Common Retention Patterns

Hot-Warm-Cold Architecture

Retain Last N Days

High Availability for Recent Data

Managing Dropped Data

Permanently Delete Data

Reload Dropped Data

Viewing Retention Rules

Best Practices

Learn More