Metric Structure
All Druid metrics share a common set of fields:The time the metric was created
The name of the metric
The service name that emitted the metric (e.g., “druid/broker”, “druid/historical”)
The host name that emitted the metric
The numeric value associated with the metric
Most metric values reset each emission period, as specified in
druid.monitoring.emissionPeriod.Query Metrics
Router Metrics
Milliseconds taken to complete a query.Dimensions:
dataSource, type, interval, hasFilters, duration, context, remoteAddress, id, statusCodeNormal value: < 1sBroker Metrics
The Broker emits detailed metrics about query processing and result merging.Core Broker Query Metrics
Core Broker Query Metrics
| Metric | Description | Normal Value |
|---|---|---|
query/time | Milliseconds taken to complete a query | < 1s |
query/bytes | Total bytes returned to client | Varies |
query/node/time | Milliseconds to query individual Historical/Realtime processes | < 1s |
query/node/bytes | Bytes returned from individual Historical/Realtime processes | Varies |
query/node/ttfb | Time to first byte from Historical/Realtime processes | < 1s |
query/count | Total number of queries | Varies |
query/success/count | Number of successful queries | Varies |
query/failed/count | Number of failed queries | Should be low |
query/interrupted/count | Number of cancelled queries | Should be low |
query/timeout/count | Number of timed out queries | Should be low |
SQL Query Metrics
SQL Query Metrics
Cache Metrics
Cache Metrics
| Metric | Description |
|---|---|
query/cache/delta/* | Cache metrics since the last emission |
query/cache/total/* | Total cache metrics |
*/numEntries | Number of cache entries |
*/sizeBytes | Size in bytes of cache entries |
*/hits | Number of cache hits |
*/misses | Number of cache misses |
*/evictions | Number of cache evictions |
*/hitRate | Cache hit rate (normal: ~40%) |
*/errors | Number of cache errors (should be 0) |
Historical Metrics
Ingestion Metrics
General Native Ingestion
- Core Metrics
- Kafka Ingestion
- Kinesis Ingestion
| Metric | Description | Dimensions |
|---|---|---|
ingest/count | Count of ingestion jobs | dataSource, taskId, taskType, taskIngestionMode |
ingest/segments/count | Final segments created | dataSource, taskId, taskType |
ingest/tombstones/count | Tombstones created | dataSource, taskId, taskType |
ingest/events/processed | Events processed per emission | dataSource, taskId, taskType |
ingest/events/unparseable | Unparseable events rejected | dataSource, taskId, taskType |
Ingestion Performance Metrics
Healthy Ingestion Indicators:
ingest/persists/backPressureis 0 or very lowingest/handoff/failedis 0ingest/events/unparseableis 0 or minimal
Coordination Metrics
Coordinator Metrics
Number of segments assigned to be loaded in the cluster.Dimensions:
dataSource, tierNumber of segments moved in the cluster.Dimensions:
dataSource, tierNumber of segments dropped due to being over-replicated.Dimensions:
dataSource, tierNumber of segments marked as unused due to drop rules.Dimensions:
dataSourceIndexing Service Metrics
Task Metrics
task/run/time: Task execution timetask/pending/time: Time waiting to starttask/waiting/time: Time waiting for schedulingtask/success/count: Successful taskstask/failed/count: Failed tasks
Slot Metrics
taskSlot/total/count: Total task slotstaskSlot/idle/count: Available slotstaskSlot/used/count: Busy slotstaskSlot/blacklisted/count: Blacklisted slots
System Metrics
JVM Metrics
Monitor JVM health with these critical metrics:Memory Metrics
Memory Metrics
Garbage Collection
Garbage Collection
Jetty Server Metrics
| Metric | Description | Normal Value |
|---|---|---|
jetty/numOpenConnections | Open connections | Not much higher than thread count |
jetty/threadPool/utilized | Threads in use | < jetty/threadPool/ready |
jetty/threadPool/utilizationRate | Thread pool utilization | 0.0 - 1.0 |
jetty/threadPool/idle | Idle threads | > 0 means spare capacity |
jetty/threadPool/queueSize | Queued requests | Should be low |
Monitoring Configuration
Enable specific monitors by configuringdruid.monitoring.monitors in common.runtime.properties:
JvmMonitor
JVM memory, GC, and buffer pool metrics
SysMonitor
CPU, disk, network, and memory system metrics
QueryCountStatsMonitor
Query success, failure, and timeout counts
CacheMonitor
Cache hit/miss rates and performance
Metrics Export
Druid can emit metrics to various monitoring systems:- Prometheus
- Graphite
- HTTP