Joins

Apache Druid has two features for joining data:

Join operators - Available using join datasources in native queries or the JOIN operator in Druid SQL
Query-time lookups - Simple key-to-value mappings preloaded on all servers, accessible with or without explicit joins

Refer to:

Join datasource documentation for native query joins
Multi-stage query join documentation for MSQ task joins
Lookups documentation for query-time lookup details

When to Use Joins

For best performance, avoid joins at query time when possible by joining data before loading into Druid. However, joins are the best solution for certain use cases:

Star and Snowflake Schemas

When you need to change dimension values after initial ingestion without reingesting:

SELECT 
  countries.v AS country_name,
  SUM(sales.revenue) AS total_revenue
FROM sales
INNER JOIN lookup.countries ON sales.country_code = countries.k
GROUP BY countries.v

Solution: Use lookups for dimension tables. Lookups are:

Preloaded in memory on all servers
Fast to access
Can be updated without reingestion

Subquery Requirements

When your workload requires:

Joins on subquery results
Filters applied to subqueries
Complex multi-step query logic

SELECT 
  top_products.product_name,
  sales.revenue
FROM sales
INNER JOIN (
  SELECT product_id, product_name 
  FROM products 
  WHERE category = 'Electronics'
  ORDER BY popularity DESC
  LIMIT 10
) top_products ON sales.product_id = top_products.product_id

Join Types

Join with Lookups

Lookups are the most common and performant join use case:

Using JOIN

SELECT
  store_to_country.v AS country,
  SUM(sales.revenue) AS country_revenue
FROM sales
INNER JOIN lookup.store_to_country ON sales.store = store_to_country.k
GROUP BY store_to_country.v

Using LOOKUP Function (Recommended)

The LOOKUP function is faster because it can defer evaluation until after aggregation:

SELECT
  LOOKUP(store, 'store_to_country') AS country,
  SUM(revenue) AS country_revenue
FROM sales
GROUP BY LOOKUP(store, 'store_to_country')

Advantages:

Automatic query rewrites (reverse lookups, pull-through GROUP BY)
Better performance in most cases
Simpler syntax

See Lookups documentation for details on query rewrites.

Performance Considerations

Native Query Joins

Druid uses broadcast hash-join algorithm:

Memory requirement: All datasources except the leftmost “base” must fit in memory
Execution: Right-side data is broadcast to all servers processing the left side
Condition: Must be an equality (or expression that reduces to equality)

Performance Tips

Use LOOKUP function over JOIN for lookups

-- Faster
LOOKUP(column, 'lookup_name')

-- Slower
JOIN lookup.lookup_name ON ...

Match column types in join conditions to avoid implicit subqueries:

-- Good: types match
FROM druid.d 
JOIN lookup.l ON d.string_field = l.k

-- Bad: requires conversion (lookup keys are always strings)
FROM druid.d 
JOIN lookup.l ON d.numeric_field = l.k

Place filters carefully - Druid doesn’t push predicates past joins:

-- Filter applied before join (efficient)
FROM (SELECT * FROM table1 WHERE active = true) t1
JOIN table2 t2 ON t1.id = t2.id

-- Filter applied after join (less efficient)
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.id
WHERE t1.active = true

Avoid comma joins - Use explicit JOIN syntax
Check EXPLAIN PLAN to identify inefficient patterns:
- Look for subqueries under left or right
- Look for condition: "1" (cartesian product)

Join Limitations

Join order is not automatically optimized
RIGHT OUTER and FULL OUTER joins not fully implemented (results may be incorrect)
Multi-value dimensions cannot be used in join conditions
Preloaded dimension tables wider than lookups are not supported (use lookups instead)

Datasources - All datasource types
Lookups - Query-time lookup configuration and usage
SQL Query Translation - How SQL joins are translated

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

When to Use Joins

Star and Snowflake Schemas

Subquery Requirements

Join Types

Join with Lookups

Using JOIN

Using LOOKUP Function (Recommended)

Performance Considerations

Native Query Joins

Performance Tips

Join Limitations

Build docs developers (and LLMs) love

Getting Started

Design & Architecture

Data Ingestion

Querying

Data Management

Operations

Configuration

​When to Use Joins

​Star and Snowflake Schemas

​Subquery Requirements

​Join Types

​Join with Lookups

​Using JOIN

​Using LOOKUP Function (Recommended)

​Performance Considerations

​Native Query Joins

​Performance Tips

​Join Limitations

​Related Documentation

Build docs developers (and LLMs) love

When to Use Joins

Star and Snowflake Schemas

Subquery Requirements

Join Types

Join with Lookups

Using JOIN

Using LOOKUP Function (Recommended)

Performance Considerations

Native Query Joins

Performance Tips

Join Limitations

Related Documentation