Skip to main content
Query performance in Apache Druid depends on optimally sized segments. Compaction is a strategy to optimize segment size by reading existing segments for a time interval and combining them into a new “compacted” set of segments.

What is Compaction?

Compaction tasks read an existing set of segments for a given time interval and combine the data into a new set of segments. This process:
  • Creates fewer, larger segments (reducing per-segment overhead)
  • Or creates more optimally-sized segments from oversized ones
  • Improves query performance through reduced processing and memory overhead
  • Does not modify underlying data by default (unless configured)
Compaction is a special type of ingestion task that reads from a Druid datasource and writes back to the same datasource.

When to Use Compaction

Suboptimal Segment Sizes

Consider compaction when segments are not optimally sized:
With streaming ingestion, out-of-order data arrival can create many small segments that should be combined.Solution: Use automatic compaction to periodically combine recent segments into optimal sizes.
Using appendToExisting with native batch ingestion can create suboptimal segments over time.Solution: Compact the affected time intervals to reorganize data.
When index_parallel tasks create many small segments due to high parallelism.Solution: Compact to merge segments to target size of ~5 million rows.
Segments that are too large increase memory pressure and slow query performance.Solution: Recompact with proper maxRowsPerSegment configuration.

Data Optimization

Compaction can also modify data to improve performance:
  • Adjust granularity: Change segment or query granularity for older data
  • Reorder dimensions: Optimize sorting to reduce segment size
  • Remove columns: Drop unused dimensions during compaction
  • Apply aggregations: Implement rollup metrics for historical data
  • Change partitioning: Move from dynamic to hash/range partitioning for perfect rollup
Compaction doesn’t improve performance in all cases. If you rewrite data with each ingestion task, you don’t need compaction. See Segment Optimization for guidance.

Compaction Methods

Automatic compaction should be your first choice for most use cases.
1

Coordinator Identifies Segments

The Coordinator uses its segment search policy to identify segments that need compaction, starting from newest to oldest.
2

Compaction Tasks Created

When uncompacted or differently-compacted segments are found, the Coordinator submits compaction tasks for those time intervals.
3

Tasks Execute Automatically

Compaction tasks run automatically on a schedule without manual intervention.
See Automatic Compaction for configuration details.

Manual Compaction

Use manual compaction when you need more control:
  • Faster completion: Run multiple concurrent tasks for more intervals when automatic compaction is too slow
  • Specific time ranges: Force compaction for particular intervals
  • Out-of-order compaction: Compact data not chronologically
  • One-time operations: Reindex with specific configurations
See Manual Compaction for task configuration.

How Compaction Handles Data

Data Consistency

During compaction:
  • Druid overwrites original segments with compacted segments
  • Time intervals are locked to ensure data consistency
  • By default, underlying data is not modified
  • Atomic updates ensure seamless query transitions
You can configure dropExisting: true in ioConfig to replace all existing segments fully contained by the interval. This is a beta feature.

Conflict Resolution

If an ingestion task needs to write to a segment locked for compaction:
  • By default: ingestion supersedes compaction, compaction task fails
  • For manual compaction: adjust input spec interval to avoid conflicts
  • For automatic compaction: set skipOffsetFromLatest to reduce conflict chances
  • Alternative: set compaction priority higher than ingestion (advanced)
See Avoid Conflicts for strategies.

Granularity Handling

Segment Granularity

By default, Druid retains the original segment granularity:
  • Same granularity, no overlap: Separate tasks for each granularity
  • Different granularity with overlap: Druid uses the closest granularity level
Example: Compacting a DAY segment (2020-01-01 to 2020-01-02) with a MONTH segment (2020-01-01 to 2020-02-01) results in a MONTH granularity segment (2020-01-01 to 2020-02-01).
You can override segment granularity in granularitySpec, but this changes how data is partitioned by time.

Query Granularity

By default, Druid retains query granularity:
  • Different query granularities: Druid chooses the finest granularity
  • Example: Combining DAY and MINUTE granularity results in MINUTE granularity
In Apache Druid 0.21.0 and prior, compacted segments used default granularity of NONE regardless of original query granularity.
Important: When changing to a coarser query granularity (e.g., MONTH to YEAR), running a kill task to remove overshadowed segments causes permanent loss of finer granularity data.

Dimension and Schema Handling

Dimension Merging

Druid supports different schemas across segments:
  • Compacted segments include all dimensions from input segments
  • Recent segments’ dimension order and types take precedence
  • Custom dimensionsSpec can control ordering and types

Rollup Behavior

Druid only rolls up output segments when rollup is set for ALL input segments.
1

Check Current Rollup Status

Use Segment Metadata Queries with analysisTypes to verify if segments are rolled up.
2

Configure Rollup in Compaction

Set rollup: true in granularitySpec only if all input segments support rollup.
3

Verify Output

Confirm compacted segments have the expected rollup configuration.
See Roll-up for more details.

Example Configurations

Basic Segment Consolidation

Combine small segments without changing data:
{
  "dataSource": "wikipedia",
  "granularitySpec": {
    "segmentGranularity": "DAY"
  }
}

Change Query Granularity for Old Data

Reduce storage for older data by changing granularity:
{
  "dataSource": "metrics_data",
  "granularitySpec": {
    "segmentGranularity": "DAY",
    "queryGranularity": "HOUR"
  },
  "skipOffsetFromLatest": "P30D"
}

Apply Range Partitioning

Optimize for read-time performance:
{
  "dataSource": "events",
  "tuningConfig": {
    "partitionsSpec": {
      "type": "range",
      "partitionDimensions": ["country", "city", "device"],
      "targetRowsPerSegment": 5000000
    }
  }
}

Remove Unused Dimensions

Drop columns no longer needed:
{
  "dataSource": "clickstream",
  "dimensionsSpec": {
    "dimensionExclusions": ["deprecated_field", "temp_column"]
  },
  "granularitySpec": {
    "segmentGranularity": "DAY",
    "queryGranularity": "MINUTE",
    "rollup": false
  }
}

Best Practices

Enable Auto-Compaction

Set up automatic compaction for all datasources to maintain optimal segment sizes continuously.

Configure Skip Offset

Use skipOffsetFromLatest to avoid compacting recently ingested data that may receive late arrivals.

Specify Full Schema

Set granularitySpec, dimensionsSpec, and metricsSpec to non-null values to optimize performance.

Monitor Segment Sizes

Target 5 million rows per segment for optimal query performance.

Learn More

Automatic Compaction

Configure and manage auto-compaction

Manual Compaction

Submit one-time compaction tasks

Segment Optimization

Guidelines for optimal segment sizing

Coordinator Process

How the Coordinator plans compaction

Build docs developers (and LLMs) love