- Join operators - Available using join datasources in native queries or the JOIN operator in Druid SQL
- Query-time lookups - Simple key-to-value mappings preloaded on all servers, accessible with or without explicit joins
- Join datasource documentation for native query joins
- Multi-stage query join documentation for MSQ task joins
- Lookups documentation for query-time lookup details
When to Use Joins
For best performance, avoid joins at query time when possible by joining data before loading into Druid. However, joins are the best solution for certain use cases:Star and Snowflake Schemas
When you need to change dimension values after initial ingestion without reingesting:- Preloaded in memory on all servers
- Fast to access
- Can be updated without reingestion
Subquery Requirements
When your workload requires:- Joins on subquery results
- Filters applied to subqueries
- Complex multi-step query logic
Join Types
Join with Lookups
Lookups are the most common and performant join use case:Using JOIN
Using LOOKUP Function (Recommended)
TheLOOKUP function is faster because it can defer evaluation until after aggregation:
- Automatic query rewrites (reverse lookups, pull-through GROUP BY)
- Better performance in most cases
- Simpler syntax
Performance Considerations
Native Query Joins
Druid uses broadcast hash-join algorithm:- Memory requirement: All datasources except the leftmost “base” must fit in memory
- Execution: Right-side data is broadcast to all servers processing the left side
- Condition: Must be an equality (or expression that reduces to equality)
Performance Tips
-
Use LOOKUP function over JOIN for lookups
-
Match column types in join conditions to avoid implicit subqueries:
-
Place filters carefully - Druid doesn’t push predicates past joins:
- Avoid comma joins - Use explicit JOIN syntax
-
Check EXPLAIN PLAN to identify inefficient patterns:
- Look for subqueries under
leftorright - Look for
condition: "1"(cartesian product)
- Look for subqueries under
Join Limitations
- Join order is not automatically optimized
- RIGHT OUTER and FULL OUTER joins not fully implemented (results may be incorrect)
- Multi-value dimensions cannot be used in join conditions
- Preloaded dimension tables wider than lookups are not supported (use lookups instead)
Related Documentation
- Datasources - All datasource types
- Lookups - Query-time lookup configuration and usage
- SQL Query Translation - How SQL joins are translated