What is Compaction?
Compaction tasks read an existing set of segments for a given time interval and combine the data into a new set of segments. This process:- Creates fewer, larger segments (reducing per-segment overhead)
- Or creates more optimally-sized segments from oversized ones
- Improves query performance through reduced processing and memory overhead
- Does not modify underlying data by default (unless configured)
Compaction is a special type of ingestion task that reads from a Druid datasource and writes back to the same datasource.
When to Use Compaction
Suboptimal Segment Sizes
Consider compaction when segments are not optimally sized:Streaming Ingestion Creates Small Segments
Streaming Ingestion Creates Small Segments
With streaming ingestion, out-of-order data arrival can create many small segments that should be combined.Solution: Use automatic compaction to periodically combine recent segments into optimal sizes.
Append Operations Fragment Data
Append Operations Fragment Data
Using
appendToExisting with native batch ingestion can create suboptimal segments over time.Solution: Compact the affected time intervals to reorganize data.Parallel Indexing Creates Too Many Segments
Parallel Indexing Creates Too Many Segments
When
index_parallel tasks create many small segments due to high parallelism.Solution: Compact to merge segments to target size of ~5 million rows.Misconfigured Ingestion Creates Oversized Segments
Misconfigured Ingestion Creates Oversized Segments
Segments that are too large increase memory pressure and slow query performance.Solution: Recompact with proper
maxRowsPerSegment configuration.Data Optimization
Compaction can also modify data to improve performance:- Adjust granularity: Change segment or query granularity for older data
- Reorder dimensions: Optimize sorting to reduce segment size
- Remove columns: Drop unused dimensions during compaction
- Apply aggregations: Implement rollup metrics for historical data
- Change partitioning: Move from dynamic to hash/range partitioning for perfect rollup
Compaction Methods
Automatic Compaction (Recommended)
Automatic compaction should be your first choice for most use cases.Coordinator Identifies Segments
The Coordinator uses its segment search policy to identify segments that need compaction, starting from newest to oldest.
Compaction Tasks Created
When uncompacted or differently-compacted segments are found, the Coordinator submits compaction tasks for those time intervals.
Manual Compaction
Use manual compaction when you need more control:- Faster completion: Run multiple concurrent tasks for more intervals when automatic compaction is too slow
- Specific time ranges: Force compaction for particular intervals
- Out-of-order compaction: Compact data not chronologically
- One-time operations: Reindex with specific configurations
How Compaction Handles Data
Data Consistency
During compaction:- Druid overwrites original segments with compacted segments
- Time intervals are locked to ensure data consistency
- By default, underlying data is not modified
- Atomic updates ensure seamless query transitions
You can configure
dropExisting: true in ioConfig to replace all existing segments fully contained by the interval. This is a beta feature.Conflict Resolution
If an ingestion task needs to write to a segment locked for compaction:- By default: ingestion supersedes compaction, compaction task fails
- For manual compaction: adjust input spec interval to avoid conflicts
- For automatic compaction: set
skipOffsetFromLatestto reduce conflict chances - Alternative: set compaction priority higher than ingestion (advanced)
Granularity Handling
Segment Granularity
By default, Druid retains the original segment granularity:- Same granularity, no overlap: Separate tasks for each granularity
- Different granularity with overlap: Druid uses the closest granularity level
Query Granularity
By default, Druid retains query granularity:- Different query granularities: Druid chooses the finest granularity
- Example: Combining DAY and MINUTE granularity results in MINUTE granularity
In Apache Druid 0.21.0 and prior, compacted segments used default granularity of NONE regardless of original query granularity.
Dimension and Schema Handling
Dimension Merging
Druid supports different schemas across segments:- Compacted segments include all dimensions from input segments
- Recent segments’ dimension order and types take precedence
- Custom
dimensionsSpeccan control ordering and types
Rollup Behavior
Druid only rolls up output segments whenrollup is set for ALL input segments.
Check Current Rollup Status
Use Segment Metadata Queries with
analysisTypes to verify if segments are rolled up.Configure Rollup in Compaction
Set
rollup: true in granularitySpec only if all input segments support rollup.Example Configurations
Basic Segment Consolidation
Combine small segments without changing data:Change Query Granularity for Old Data
Reduce storage for older data by changing granularity:Apply Range Partitioning
Optimize for read-time performance:Remove Unused Dimensions
Drop columns no longer needed:Best Practices
Enable Auto-Compaction
Set up automatic compaction for all datasources to maintain optimal segment sizes continuously.
Configure Skip Offset
Use
skipOffsetFromLatest to avoid compacting recently ingested data that may receive late arrivals.Specify Full Schema
Set
granularitySpec, dimensionsSpec, and metricsSpec to non-null values to optimize performance.Monitor Segment Sizes
Target 5 million rows per segment for optimal query performance.
Learn More
Automatic Compaction
Configure and manage auto-compaction
Manual Compaction
Submit one-time compaction tasks
Segment Optimization
Guidelines for optimal segment sizing
Coordinator Process
How the Coordinator plans compaction