You must be on Apache Kafka version 0.11.x or higher. For older versions, refer to the Apache Kafka upgrade guide.
Setup
To use the Kafka indexing service, load thedruid-kafka-indexing-service extension on both the Overlord and Middle Manager.
See Loading extensions for more information.
Supervisor Spec Configuration
This section covers Kafka-specific configuration properties. For properties shared across all streaming methods, see Supervisor spec.I/O Configuration
Kafka-specificioConfig properties:
The Kafka topic to read from. Once set, this value cannot be updated. To ingest from multiple topics, use
topicPattern instead.Multiple Kafka topics to read from, passed as a regex pattern. See Ingest from multiple topics.
A map of properties to pass to the Kafka consumer. Must include
bootstrap.servers at minimum.The length of time to wait for the Kafka consumer to poll records, in milliseconds.
If a supervisor manages a datasource for the first time, this determines whether to retrieve the earliest or latest offsets in Kafka.
Defines how and when the Kafka supervisor can become idle. See Idle configuration.
Ingest from Multiple Topics
Migrating an existing supervisor from
topic to topicPattern is not supported. You must suspend, reset offsets, and resubmit the supervisor.topicPattern property with a regex pattern:
Consumer Properties
Consumer properties control how a supervisor reads events from Kafka. You must includebootstrap.servers with a list of Kafka brokers:
SSL/SASL Configuration
SSL/SASL Configuration
To enable SSL connections with SASL authentication, set environment variables for the Druid user on Overlord and Peon machines:Then reference these in the supervisor spec using the dynamic config provider:
Transaction Isolation
Transaction Isolation
The
isolation.level property determines how Druid reads transactional messages:read_uncommitted(default): Reads all messages, including aborted transactionsread_committed: Reads only committed transactional messages
Idle Configuration
When the supervisor enters idle state, no new tasks are launched after current tasks complete. This can reduce costs for topics with sporadic data.If
true, the supervisor becomes idle when there’s no data on the input topic for some time.The supervisor becomes idle if all data has been read and no new data is published for this many milliseconds.
Kafka Input Format
Thekafka input format lets you parse Kafka metadata fields in addition to the payload value contents.
Supported Metadata Fields
- Kafka timestamp: Event timestamp from Kafka
- Kafka topic: Topic name
- Kafka headers: Event headers
- Kafka key: Message key
Configuration
Define how to parse the payload value (e.g.,
{ "type": "json" }).Custom name for the Kafka timestamp in the Druid schema.
Custom name for the Kafka topic in the Druid schema. Useful when ingesting from multiple topics.
Encoding format for Kafka headers. Supported values:
string, ISO-8859-1, US-ASCII, UTF-16, UTF-16BE, UTF-16LE.Prefix for Kafka header columns to avoid conflicts with payload columns.
Input format to parse the key. Only the first value is used.
Name for the Kafka key column.
Tuning Configuration
Kafka-specifictuningConfig properties:
Number of threads to use to create and persist incremental segments on disk. Increase for datasources with hundreds or thousands of columns.
Deployment Notes
Druid assigns Kafka partitions to each Kafka indexing task. A task writes events from Kafka into a single segment until reaching:maxRowsPerSegmentmaxTotalRowsintermediateHandoffPeriod
Incremental Hand-offs
Tasks perform incremental hand-offs, so segments become available as they’re ready. When limits are reached:- Task hands off all segments
- Creates a new set of segments
- Continues ingestion
Small Segments
Small segments may still be produced. For example:- Task duration: 4 hours
- Segment granularity: HOUR
- Supervisor started: 9:10
Related Resources
Supervisor API
Manage and monitor supervisors via API
Supervisor Reference
Supervisor status and capacity planning
Kafka Tutorial
Load data from Apache Kafka
Data Formats
Supported input formats