Data Deletion - Apache Druid

Apache Druid stores data in immutable segments partitioned by time chunk. This guide covers the different methods for deleting data from your Druid cluster.

Deletion Overview

Druid supports two levels of deletion:

Soft delete (mark unused): Data unavailable for queries but remains in deep storage
Hard delete (kill): Permanently removes data from deep storage and metadata store

Hard deletes using kill tasks are permanent and cannot be undone unless you have backups.

Delete by Time Range

Deleting data by time range is a fast, metadata-only operation that happens in two steps:

Mark Segments as Unused

Segments are marked “unused” via drop rules or manual API calls. This is a soft delete - data becomes unavailable for queries but remains in deep storage.

Permanently Delete (Optional)

Use a kill task to permanently delete segment files from deep storage and remove metadata records.

Manual Time Range Deletion

Use the Coordinator API to mark segments as unused:

curl -X POST 'http://localhost:8081/druid/coordinator/v1/datasources/wikipedia/markUnused' \
--header 'Content-Type: application/json' \
--data-raw '{
  "interval": "2023-01-01/2023-02-01"
}'

This is a soft delete. Segments remain in deep storage until you run a kill task.

See Legacy Metadata API for API details.

Automatic Deletion with Drop Rules

Use retention rules to automatically mark segments as unused based on time:

[
  {
    "type": "loadByPeriod",
    "period": "P30D",
    "tieredReplicants": {
      "_default_tier": 2
    }
  },
  {
    "type": "dropForever"
  }
]

This configuration:

Loads segments from the last 30 days
Drops all other segments

Apply via API:

curl -X POST 'http://localhost:8081/druid/coordinator/v1/rules/wikipedia' \
--header 'Content-Type: application/json' \
--data-raw '[{
  "type": "dropBeforeByPeriod",
  "period": "P90D"
}]'

Dropped segments remain in deep storage. Enable auto-kill or use kill tasks for permanent deletion.

See Retention Rules for all rule types.

Delete Specific Records

Druid doesn’t support deleting individual records directly. Instead, use reindexing with a filter to exclude unwanted data.

Native Batch Reindex with Filter

Filter out records during reindex:

{
  "type": "index_parallel",
  "spec": {
    "dataSchema": {
      "dataSource": "wikipedia",
      "transformSpec": {
        "filter": {
          "type": "not",
          "field": {
            "type": "selector",
            "dimension": "userName",
            "value": "bot"
          }
        }
      },
      "granularitySpec": {
        "segmentGranularity": "day",
        "intervals": ["2023-01-01/2023-02-01"]
      }
    },
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "druid",
        "dataSource": "wikipedia",
        "interval": "2023-01-01/2023-02-01"
      },
      "appendToExisting": false
    },
    "tuningConfig": {
      "type": "index_parallel",
      "maxRowsPerSegment": 5000000
    }
  }
}

Key settings:

transformSpec.filter with type: "not" excludes matching records
inputSource.type: "druid" reads from existing datasource
appendToExisting: false replaces existing segments

SQL REPLACE with Filter

Use SQL to exclude specific records:

REPLACE INTO wikipedia
OVERWRITE WHERE __time >= '2023-01-01' AND __time < '2023-02-01'
SELECT
  __time,
  channel,
  "user",
  added,
  deleted
FROM wikipedia
WHERE __time >= '2023-01-01' 
  AND __time < '2023-02-01'
  AND userName != 'bot'
PARTITIONED BY DAY

Delete Multiple Values

Exclude multiple values using NOT IN or complex filters:

{
  "transformSpec": {
    "filter": {
      "type": "not",
      "field": {
        "type": "in",
        "dimension": "country",
        "values": ["XX", "UNKNOWN", "TEST"]
      }
    }
  }
}

Or in SQL:

WHERE country NOT IN ('XX', 'UNKNOWN', 'TEST')

Delete by Condition

Remove records matching complex conditions:

{
  "transformSpec": {
    "filter": {
      "type": "and",
      "fields": [
        {
          "type": "not",
          "field": {"type": "selector", "dimension": "is_deleted", "value": "true"}
        },
        {
          "type": "bound",
          "dimension": "value",
          "lower": "0",
          "lowerStrict": false
        }
      ]
    }
  }
}

Reindexed data marked as unused still remains in deep storage. Run a kill task for permanent deletion.

See Transform Spec for filter options.

Delete Entire Datasource

To delete all data in a datasource:

Mark All Segments Unused

Use the Coordinator API or web console to mark all segments as unused.

Kill Segments (Optional)

Submit a kill task to permanently delete from deep storage.

Remove Datasource Metadata (Optional)

The datasource metadata will be removed automatically after all segments are killed.

Via Web Console

Navigate to Datasources
Click the datasource name
Select Actions > Mark all segments as unused
Optionally, submit a kill task for permanent deletion

Via API

curl -X POST 'http://localhost:8081/druid/coordinator/v1/datasources/wikipedia/markUnused' \
--header 'Content-Type: application/json' \
--data-raw '{
  "interval": "1000-01-01/3000-01-01"
}'

Permanent Deletion with Kill Tasks

Kill tasks permanently delete unused segments from deep storage and the metadata store.

Kill Task Syntax

{
  "type": "kill",
  "id": "kill_wikipedia_2023",
  "dataSource": "wikipedia",
  "interval": "2023-01-01/2024-01-01",
  "versions": null,
  "batchSize": 100,
  "limit": null,
  "maxUsedStatusLastUpdatedTime": null
}

Kill Task Parameters

Parameter	Default	Description
`type`	-	Must be `"kill"`
`dataSource`	-	Datasource name
`interval`	-	Time range of segments to kill
`versions`	null (all)	Specific segment versions to delete
`batchSize`	100	Segments deleted per batch to avoid blocking Overlord
`limit`	null (no limit)	Maximum number of segments to delete
`maxUsedStatusLastUpdatedTime`	null (no cutoff)	Only kill segments marked unused before this timestamp

Submit Kill Task

Via API:

curl -X POST 'http://localhost:8081/druid/indexer/v1/task' \
--header 'Content-Type: application/json' \
--data-raw '{
  "type": "kill",
  "dataSource": "wikipedia",
  "interval": "2023-01-01/2023-02-01",
  "batchSize": 200
}'

Via web console:

Navigate to Tasks
Click Submit task
Paste kill task JSON
Click Submit

Kill tasks permanently remove all information about affected segments from the metadata store and deep storage. This operation cannot be undone.

Kill Specific Versions

Delete only certain segment versions:

{
  "type": "kill",
  "dataSource": "wikipedia",
  "interval": "2023-01-01/2023-02-01",
  "versions": ["2023-01-15T10:00:00.000Z", "2023-01-20T15:30:00.000Z"]
}

Auto-Kill Unused Segments

Automate permanent deletion of unused segments.

Auto-Kill on Coordinator

Enable auto-kill in Coordinator runtime properties:

druid.coordinator.kill.on=true
druid.coordinator.kill.period=P1D
druid.coordinator.kill.durationToRetain=P7D
druid.coordinator.kill.maxSegments=1000

Configuration:

kill.on: Enable auto-kill (default: false)
kill.period: How often to run kill tasks
kill.durationToRetain: Keep unused segments this long before killing
kill.maxSegments: Maximum segments to kill per invocation

The Coordinator periodically:

Identifies unused segments older than durationToRetain
Submits kill tasks for eligible intervals
Processes up to maxSegments per run

See Data Management Configuration for all options.

Auto-Kill on Overlord (Experimental)

This is an experimental feature. Do not use if auto-kill is enabled on the Coordinator.

Requires segment metadata caching enabled on Overlord:

druid.manager.segments.useIncrementalCache=always
druid.indexer.task.killUnusedSegments.on=true  
druid.indexer.task.killUnusedSegments.durationToRetain=P7D

Benefits of Overlord auto-kill:

No REST API overhead between tasks and Overlord
Kills segments as soon as they become eligible
Runs on Overlord, doesn’t consume task slots
Faster execution (no task process launch overhead)
Skips locked intervals to avoid blocking
Handles large numbers of unused segments efficiently

See Auto-Kill Configuration for details.

Deletion Best Practices

Test First

Test deletion logic on a small time range before applying to production data.

Use Batch Size

Set appropriate batchSize in kill tasks to avoid blocking Overlord operations.

Retention Buffer

Set durationToRetain to allow time to recover from accidental soft deletes.

Monitor Kill Tasks

Watch kill task metrics and logs to ensure deletions complete successfully.

Common Deletion Patterns

Delete Old Data Periodically

Combine retention rules with auto-kill:

# Retention rule (via API)
{
  "type": "dropBeforeByPeriod",
  "period": "P90D"
}

# Auto-kill configuration  
druid.coordinator.kill.on=true
druid.coordinator.kill.durationToRetain=P7D

This:

Drops segments older than 90 days
Permanently deletes them after 7 days

Delete Test Data

Remove test data by filter:

REPLACE INTO events
OVERWRITE ALL
SELECT *
FROM events  
WHERE environment != 'test'
PARTITIONED BY DAY

Then kill the old segments:

{
  "type": "kill",
  "dataSource": "events",
  "interval": "1000-01-01/3000-01-01"
}

Gradual Datasource Deletion

Delete a large datasource in batches:

# Month 1
curl -X POST 'http://localhost:8081/druid/indexer/v1/task' \
--data '{"type":"kill","dataSource":"logs","interval":"2023-01-01/2023-02-01","batchSize":500}'

# Month 2  
curl -X POST 'http://localhost:8081/druid/indexer/v1/task' \
--data '{"type":"kill","dataSource":"logs","interval":"2023-02-01/2023-03-01","batchSize":500}'

# Continue for each month...

Delete PII After Retention Period

Reindex to remove sensitive fields:

REPLACE INTO user_events
OVERWRITE WHERE __time < CURRENT_TIMESTAMP - INTERVAL '1' YEAR  
SELECT
  __time,
  user_id,
  event_type,
  -- Omit PII fields: email, ip_address, phone
  country,
  device_type
FROM user_events
WHERE __time < CURRENT_TIMESTAMP - INTERVAL '1' YEAR
PARTITIONED BY DAY

Troubleshooting

Segments Not Deleting

Check:

Are segments marked “unused”? Query metadata
Do kill tasks have correct interval?
Is maxUsedStatusLastUpdatedTime filtering segments?
Check Overlord logs for kill task errors

Kill Tasks Timing Out

Reduce batchSize to process fewer segments per batch:

{
  "batchSize": 50
}

Accidental Deletion

If segments marked unused by mistake:

Quickly mark them as “used” via API before kill task runs
Restore from deep storage backups if already killed
Re-ingest from source data if available

Always have backups of critical data before running kill tasks.

Learn More

Deletion Tutorial

Step-by-step guide to deleting data

Retention Rules

Configure automatic data retention

Data Updates

Reindex data to modify or filter records

Metadata API

API reference for segment management

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​Deletion Overview

​Delete by Time Range

​Manual Time Range Deletion

​Automatic Deletion with Drop Rules

​Delete Specific Records

​Native Batch Reindex with Filter

​SQL REPLACE with Filter

​Delete Multiple Values

​Delete by Condition

​Delete Entire Datasource

​Via Web Console

​Via API

​Permanent Deletion with Kill Tasks

​Kill Task Syntax

​Kill Task Parameters

​Submit Kill Task

​Kill Specific Versions

​Auto-Kill Unused Segments

​Auto-Kill on Coordinator

​Auto-Kill on Overlord (Experimental)

​Deletion Best Practices

Test First

Use Batch Size

Retention Buffer

Monitor Kill Tasks

​Common Deletion Patterns

​Delete Old Data Periodically

​Delete Test Data

​Gradual Datasource Deletion

​Delete PII After Retention Period

​Troubleshooting

​Segments Not Deleting

​Kill Tasks Timing Out

​Accidental Deletion

​Learn More

Deletion Tutorial

Retention Rules

Data Updates

Metadata API

Build docs developers (and LLMs) love

Deletion Overview

Delete by Time Range

Manual Time Range Deletion

Automatic Deletion with Drop Rules

Delete Specific Records

Native Batch Reindex with Filter

SQL REPLACE with Filter

Delete Multiple Values

Delete by Condition

Delete Entire Datasource

Via Web Console

Via API

Permanent Deletion with Kill Tasks

Kill Task Syntax

Kill Task Parameters

Submit Kill Task

Kill Specific Versions

Auto-Kill Unused Segments

Auto-Kill on Coordinator

Auto-Kill on Overlord (Experimental)

Deletion Best Practices

Common Deletion Patterns

Delete Old Data Periodically

Delete Test Data

Gradual Datasource Deletion

Delete PII After Retention Period

Troubleshooting

Segments Not Deleting

Kill Tasks Timing Out

Accidental Deletion

Learn More