This guide uses the Apache Kafka term “offset” to refer to the identifier for records in a partition. For Amazon Kinesis, the equivalent is “sequence number”.
Supervisor Spec
Druid uses a JSON specification (supervisor spec) to define tasks for streaming ingestion or auto-compaction. The supervisor spec specifies how Druid should consume, process, and index data from an external stream.Top-Level Properties
The supervisor id. This should be a unique ID. If unspecified, defaults to
spec.dataSchema.dataSource.The supervisor type. For streaming ingestion:
kafka, kinesis, or rabbit. For automatic compaction: autocompact.The container object for the supervisor configuration.
The schema for the indexing task to use during ingestion. See dataSchema.
The I/O configuration object to define connection and I/O-related settings.
The tuning configuration object to define performance-related settings.
Extra configuration for both the supervisor and the tasks it spawns.
Puts the supervisor in a suspended state.
I/O Configuration
The following properties apply to both Kafka and Kinesis ingestion. For source-specific properties, see Kafka I/O configuration and Kinesis I/O configuration.The input format to define input data parsing. See input formats.
Defines auto scaling behavior for ingestion tasks. See Task autoscaler.
The maximum number of reading tasks in a replica set. Total tasks =
taskCount × replicas.The number of replica sets. Druid assigns task replicas to different workers for resiliency.
The length of time before tasks stop reading and begin publishing segments.
The period to wait before the supervisor starts managing tasks.
How often the supervisor executes its management logic. Specifies the maximum time between iterations.
How long to wait before declaring a publishing task as failed and terminating it.
Reject messages with timestamps earlier than this datetime. Helps prevent concurrency issues with late messages.
Reject messages with timestamps earlier than this period before the task was created.
Reject messages with timestamps later than this period after the task reached its duration.
Limits the number of ingestion tasks that can cycle at any given time. See stopTaskCount.
Map of server priorities to the number of replicas per priority. Enables query isolation for mixed workloads.Example:
{"1": 2, "0": 1} creates 2 replicas with priority 1 and 1 replica with priority 0.Task Autoscaler
Optionally configure autoscaling behavior for ingestion tasks usingautoScalerConfig.
Enables the autoscaler. If not specified, autoscaler is disabled even when
autoScalerConfig is set.Maximum number of ingestion tasks. Must be ≥
taskCountMin.Minimum number of ingestion tasks. Overrides
taskCount when autoscaler is enabled.Optional number of ingestion tasks to start with.
Minimum time interval between two scale actions.
The algorithm of autoscaler. Druid only supports
lagBased and costBased (experimental).Variable version of
ioConfig.stopTaskCount with range (0.0, 1.0]. Makes max stoppable tasks proportional to running tasks.Lag-Based Autoscaler Strategy
For Kinesis, lag metrics are reported as time difference in milliseconds rather than message count.
Time period during which Druid collects lag metric points.
Total time window of lag collection.
Threshold for scale out action.
Enable scale out if this fraction of lag points is higher than
scaleOutThreshold.Threshold for scale in action.
Enable scale in if this fraction of lag points is lower than
scaleOutThreshold.Delay after supervisor starts before the first scale logic check.
Frequency to check if a scale action is triggered.
Number of tasks to reduce when scaling down.
Number of tasks to add when scaling out.
Aggregate function for lag metric. Options:
MAX, SUM, AVERAGE.Cost-Based Autoscaler Strategy
Computes required task count via cost function based on ingestion lag and poll-to-idle ratio. Task counts are selected from a bounded range derived from the current partitions-per-task ratio.Frequency to check if a scale action is triggered.
Weight of extracted lag value in cost function.
Weight of extracted poll idle value in cost function.
Enables bounded partitions-per-task window when selecting task counts.
Per-partition lag threshold for burst scale-up. Set to > 0 to enable, < 0 to disable.
Minimum duration between successful scale actions.
Limits task scaling down to periods during task rollovers only.
stopTaskCount
Before settingstopTaskCount, note:
Tuning Configuration
The following properties apply to both Kafka and Kinesis. For source-specific properties, see Kafka tuning configuration and Kinesis tuning configuration.The tuning type code:
kafka or kinesis.Number of rows to accumulate before persisting. Represents post-aggregation rows.
Number of bytes to accumulate in heap memory before persisting. Max heap usage =
maxBytesInMemory * (2 + maxPendingPersists).Exclude overhead object bytes from the
maxBytesInMemory check.Number of rows to store in a segment (post-aggregation). Handoff occurs when limits are reached.
Number of rows to aggregate across all segments (post-aggregation).
Period that determines how often tasks hand off segments.
Period that determines the rate at which intermediate persists occur.
Maximum number of persists that can be pending but not started. Value of 0 means one persist can run concurrently with ingestion.
Reset partitions when offset is unavailable. If
false, surfaces exception causing tasks to fail.Number of threads the supervisor uses to handle requests/responses for worker tasks.
Number of HTTP request retries to indexing tasks before considering tasks unresponsive.
Period of time to wait for HTTP response from an indexing task.
Period of time to wait for supervisor to attempt graceful shutdown before exiting.
How often supervisor queries streaming source and indexing tasks to fetch current offsets and calculate lag. Minimum value:
PT5S.Start a Supervisor
Start a new supervisor by submitting a supervisor spec via:- Web console: Use the data loader
- API: POST to the Supervisor API
Schema and Configuration Changes
To make schema or configuration changes:- Submit a new supervisor spec
- Overlord initiates graceful shutdown of existing supervisor
- Running supervisor signals tasks to stop reading and begin publishing
- New supervisor is created with updated configuration
- Updated schema is applied while retaining existing publishing tasks
- New tasks start at previous task offsets
Status Report
The supervisor status report contains the state of supervisor tasks and an array of recent exceptions. View status in the web console:- Navigate to Supervisors view
- Click the supervisor ID
- Click Status in the left navigation
State Properties
- state: Generic state applicable to any supervisor type
- detailedState: Implementation-specific state with more insight
state values: PENDING, RUNNING, SUSPENDED, STOPPING, UNHEALTHY_SUPERVISOR, UNHEALTHY_TASKS
Detailed State Mapping
| detailedState | state | Description |
|---|---|---|
UNHEALTHY_SUPERVISOR | UNHEALTHY_SUPERVISOR | Supervisor encountered errors on previous iterations |
UNHEALTHY_TASKS | UNHEALTHY_TASKS | Last tasks all failed |
UNABLE_TO_CONNECT_TO_STREAM | UNHEALTHY_SUPERVISOR | Connectivity issues, never connected |
LOST_CONTACT_WITH_STREAM | UNHEALTHY_SUPERVISOR | Connectivity issues, previously connected |
PENDING | PENDING | Initialized but not started |
CONNECTING_TO_STREAM | RUNNING | Trying to connect to stream |
DISCOVERING_INITIAL_TASKS | RUNNING | Discovering already-running tasks |
CREATING_TASKS | RUNNING | Creating tasks and discovering state |
RUNNING | RUNNING | Tasks started, waiting for duration |
IDLE | IDLE | No new data, existing data read |
SUSPENDED | SUSPENDED | Supervisor suspended |
STOPPING | STOPPING | Supervisor stopping |
For Kafka, consumer lag per partition may be reported as negative if the supervisor hasn’t received the latest offset response. Aggregate lag is always ≥ 0.
SUPERVISORS System Table
Query thesys.supervisors table to retrieve supervisor information:
Manage a Supervisor
Manage supervisors via the web console or Supervisor API. In the web console:- Navigate to Supervisors view
- Click the ellipsis in the Actions column
- Select the desired action
Suspend
Pauses a running supervisor. The suspended supervisor continues to emit logs and metrics. Indexing tasks remain suspended until you resume. API: Suspend a running supervisorSet Offsets
Resets offsets for supervisor partitions. This:- Clears stored offsets
- Instructs supervisor to resume from specified offsets
- Terminates and recreates active tasks for specified partitions
- Saves specified offsets in metadata store if none exist
Hard Reset
Clears supervisor metadata, causing data reading to resume from earliest or latest position based onuseEarliestOffset. Terminates and recreates active tasks.
Use to recover from a stopped state due to missing offsets.
API: Reset a supervisor
Terminate
Stops a supervisor and its indexing tasks, triggering segment publishing. Places a tombstone marker in the metadata store to prevent reloading on restart. The terminated supervisor still exists in metadata store and its history can be retrieved. API: Terminate a supervisorCapacity Planning
Indexing tasks run on Middle Managers and are limited by available resources. Ensure sufficient worker capacity configured viadruid.worker.capacity.
Worker capacity is shared across all indexing task types: batch processing, streaming tasks, and merging tasks.
- Reading: For the period defined in
taskDuration - Publishing: Until segments are generated, pushed to deep storage, and loaded by Historical services, or until
completionTimeoutelapses
Minimum Capacity Formula
Number of reading tasks =replicas * taskCount (exception: if taskCount exceeds number of shards/partitions, Druid uses the number of shards/partitions)
For reading and publishing tasks to run concurrently:
Multi-Supervisor Support
Druid supports multiple stream supervisors ingesting into the same datasource simultaneously. You can have any number of stream supervisors (Kafka, Kinesis, etc.) ingesting into the same datasource. See Concurrent append and replace for more information.Related Resources
Supervisor API
Manage and monitor supervisors via API
Kafka Ingestion
Apache Kafka streaming ingestion
Kinesis Ingestion
Amazon Kinesis streaming ingestion
Automatic Compaction
Automatic compaction with supervisors