Skip to main content
Lookups in Apache Druid are key-value mappings that replace dimension values with new values, enabling join-like functionality. Think of lookups as dimension tables in a data warehouse. Key concepts:
  • Key: Dimension value to match (e.g., appid-12345)
  • Value: Replacement value (e.g., Super Mega Awesome App)

Lookup Characteristics

One-to-Many Mappings

Lookups support both:
  • One-to-one: Country code → Country name
  • Many-to-one: Multiple app IDs → Single account manager
When lookups are one-to-one (injective), Druid can apply additional query optimizations.

No Historical Data

Lookups always use current data regardless of query time range:
SELECT LOOKUP(app_id, 'app_to_manager') AS manager
FROM events
WHERE __time BETWEEN '2023-01-01' AND '2023-12-31'
This returns the current account manager for each app_id, not the historical manager during 2023.
For time-sensitive mappings, include the mapping data in your raw denormalized data during ingestion.

Storage

Lookups are:
  • Preloaded in memory on all servers
  • Very small lookups (dozens to hundreds of entries) can be passed inline in queries using “map” lookup type

Query Syntax

SELECT
  LOOKUP(store, 'store_to_country') AS country,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY LOOKUP(store, 'store_to_country')
With default value:
SELECT
  LOOKUP(store, 'store_to_country', 'Unknown') AS country,
  SUM(revenue)
FROM sales
GROUP BY 1
The third argument acts like COALESCE - returns ‘Unknown’ if lookup has no value for the key.

JOIN Syntax

SELECT
  store_to_country.v AS country,
  SUM(sales.revenue) AS country_revenue
FROM sales
INNER JOIN lookup.store_to_country ON sales.store = store_to_country.k
GROUP BY store_to_country.v
Lookup datasources have exactly two columns:
  • k - The key (always STRING)
  • v - The value (always STRING)
The LOOKUP function has automatic query rewrites that JOIN doesn’t support. Use LOOKUP function for better performance.

Native Queries

Use dimension specs or extraction functions:
{
  "type": "extraction",
  "dimension": "store",
  "outputName": "country",
  "extractionFn": {
    "type": "registeredLookup",
    "lookup": "store_to_country",
    "replaceMissingValueWith": "Unknown"
  }
}

Query Rewrites

Druid automatically optimizes queries using the LOOKUP function:

Reverse Lookup

Druid reverses lookups in WHERE clauses to filter on the original dimension: Original query:
SELECT
  LOOKUP(sku, 'sku_to_name') AS name,
  SUM(revenue)
FROM sales
WHERE LOOKUP(sku, 'sku_to_name') = 'WhizBang Sprocket'
GROUP BY LOOKUP(sku, 'sku_to_name')
Optimized (automatic):
SELECT
  LOOKUP(sku, 'sku_to_name') AS name,
  SUM(revenue)
FROM sales
WHERE sku = 'WB00013'  -- Druid looks up the key for 'WhizBang Sprocket'
GROUP BY LOOKUP(sku, 'sku_to_name')
This is faster because servers can use indexes on sku instead of applying the lookup to every row.

Reversible Patterns

Examples of filters that can be reversed:
SQLReversible?Notes
LOOKUP(...) = 'value'Yes
LOOKUP(...) IS NOT DISTINCT FROM 'value'YesNon-null literals only
LOOKUP(...) <> 'value'Only if injective
LOOKUP(...) IN ('a', 'b')Yes
LOOKUP(...) NOT IN ('a', 'b')Only if injective
LOOKUP(UPPER(col), ...) = 'value'YesUPPER remains
COALESCE(LOOKUP(...), 'N/A') = 'value'YesBut not for = 'N/A' unless injective
LOWER(LOOKUP(...)) = 'value'NoCan’t reverse non-COALESCE functions
MV_CONTAINS(LOOKUP(...), 'value')Yes
NOT MV_CONTAINS(LOOKUP(...), 'value')Only if injective
Configuration: Set sqlReverseLookup: false in query context to disable. Threshold limits:

Pull Up Through GROUP BY

Injective lookups can be pulled up past GROUP BY: Original query:
SELECT
  LOOKUP(sku, 'sku_to_name') AS name,
  SUM(revenue)
FROM sales
GROUP BY LOOKUP(sku, 'sku_to_name')
Optimized (automatic, if sku_to_name is injective):
SELECT
  LOOKUP(sku, 'sku_to_name') AS name,
  SUM(revenue)
FROM sales
GROUP BY sku  -- Group by original column
-- LOOKUP applied after grouping
This is faster because grouping happens on the original column, then lookup is applied once per group instead of once per row. Set sqlPullUpLookup: false in query context to disable.

Injective Lookups

Injective lookups are “one-to-one” mappings that satisfy:
  1. All values are unique - No two keys map to the same value
  2. Complete coverage - Every input the LOOKUP function encounters has a key
  3. NULL handling - Don’t need key for NULL (LOOKUP of NULL is always NULL)
Set injective: true in the lookup definition to enable all optimizations.
Druid does NOT verify injectivity. Setting injective: true for non-injective lookups can produce incorrect results.
Example injective lookup:
{
  "version": "1",
  "lookupExtractorFactory": {
    "type": "map",
    "map": {
      "US": "United States",
      "CA": "Canada",
      "MX": "Mexico"
    },
    "injective": true
  }
}

Configuration

Lookup Tiers

Lookups are organized into “tiers” - groups of services that receive a set of lookups:
  • Historicals typically use __default tier
  • Peons can use datasource-specific tiers
  • Lookup tiers are independent of Historical tiers

Server Configuration

Broker / Router / Historical / Peon:
PropertyDescriptionDefault
druid.lookup.lookupTierTier for this process__default
druid.lookup.lookupTierIsDatasourceUse datasource name as tier (for Peons)false
Coordinator:
PropertyDescriptionDefault
druid.manager.lookups.hostTimeoutTimeout per host (ms)2000
druid.manager.lookups.allHostTimeoutTotal timeout (ms)900000 (15 min)
druid.manager.lookups.periodPause between management cycles (ms)120000 (2 min)
druid.manager.lookups.threadPoolSizeConcurrent managed processes10

Snapshot Configuration

Save lookup configuration across restarts:
PropertyDescriptionDefault
druid.lookup.snapshotWorkingDirDirectory for lookup snapshotsnull (disabled)
druid.lookup.enableLookupSyncOnStartupFetch from Coordinator on startuptrue
druid.lookup.numLookupLoadingThreadsParallel loading threadsAvailable Processors / 2
druid.lookup.coordinatorFetchRetriesRetries to fetch lookup list3
druid.lookup.lookupStartRetriesRetries to start each lookup3
druid.lookup.coordinatorRetryDelayDelay between retries (ms)60000

Lookup API

Configure lookups:
POST http://<COORDINATOR>:<PORT>/druid/coordinator/v1/lookups/config/{tier}/{id}
Initialize (first time only):
curl -X POST http://coordinator:8081/druid/coordinator/v1/lookups/config \
  -H 'Content-Type: application/json' \
  -d '{}'
See Lookups API reference for all endpoints.

Introspection

Brokers provide lookup introspection endpoints: Get all key-value pairs:
curl http://broker:8082/druid/v1/lookups/introspect/nato-phonetic
{
  "A": "Alfa",
  "B": "Bravo",
  "C": "Charlie",
  "Z": "Zulu"
}
Get keys only:
curl http://broker:8082/druid/v1/lookups/introspect/nato-phonetic/keys
["A", "B", "C", "Z"]
Get values only:
curl http://broker:8082/druid/v1/lookups/introspect/nato-phonetic/values
["Alfa", "Bravo", "Charlie", "Zulu"]

Lookup Extensions

Build docs developers (and LLMs) love