Supported Formats
JSON
Most common format for ingesting data
CSV
Comma-separated values format
TSV
Tab-separated values format
Parquet
Columnar storage format
ORC
Optimized Row Columnar format
Avro
Binary serialization format
Data Format Examples
- JSON
- CSV
- TSV
Input Format Configuration
JSON Format
Configure the JSONinputFormat to load JSON data:
Set value to
jsonSpecifies flattening configuration for nested JSON data
JSON parser features supported by Jackson. Example:
{"ALLOW_SINGLE_QUOTES": true, "ALLOW_UNQUOTED_FIELD_NAMES": true}If true, enables more flexible parsing exception handling for newline-delimited JSON
When ingesting multi-line JSON events, enables retention of valid JSON events encountered before a parsing exception
CSV Format
Configure the CSVinputFormat to load CSV data:
Set value to
csvSpecifies the columns of the data. Required if
findColumnsFromHeader is false or missingIf true, extracts column names from the header row
Number of rows to skip at the beginning
Custom delimiter for multi-value dimensions
If true, attempts to parse numeric strings into long or double
TSV Format
Configure the TSVinputFormat to load TSV data:
Set value to
tsvCustom delimiter for data values
Specifies the columns of the data. Required if
findColumnsFromHeader is false or missingIf true, extracts column names from the header row
Binary Formats
Parquet Format
Load the
druid-parquet-extensions extension to use Parquet formatSet value to
parquetDefine a flattenSpec to extract nested values. Only ‘path’ expressions are supported
Treat bytes parquet columns as UTF-8 encoded strings
ORC Format
Load the
druid-orc-extensions extension to use ORC formatSet value to
orcSpecifies flattening configuration for nested ORC data. Only ‘path’ expressions are supported
Treat binary ORC columns as UTF-8 encoded strings
Avro Stream Format
Load the
druid-avro-extensions extension to use Avro formatSet value to
avro_streamSpecifies how to decode bytes to Avro record
Define a flattenSpec to extract nested values. Only ‘path’ expressions are supported
Treat bytes Avro columns as UTF-8 encoded strings
Kafka Input Format
Thekafka input format lets you parse Kafka metadata fields in addition to the payload value contents.
Set value to
kafkaThe input format to parse the Kafka value payload
The name of the column for the Kafka timestamp
The name of the column for the Kafka topic
Specifies how to parse Kafka headers. Supports String types with various encodings (UTF-8, ISO-8859-1, etc.)
Prefix for all header columns
The input format to parse the Kafka key
The name of the column for the Kafka key
Kafka Metadata Example
Given this Kafka message:- Kafka timestamp:
1680795276351 - Kafka topic:
wiki-edits - Kafka headers:
env=development,zone=z1 - Kafka key:
wiki-edit - Kafka payload:
{"channel":"#sv.wikipedia","timestamp":"2016-06-27T00:00:11.080Z","page":"Salo Toraut","delta":31}
Kinesis Input Format
Thekinesis input format lets you parse Kinesis metadata fields in addition to the payload value contents.
Set value to
kinesisThe input format to parse the Kinesis value payload
The name of the column for the Kinesis partition key
The name of the column for the Kinesis timestamp
FlattenSpec
You can use theflattenSpec object to flatten nested data as an alternative to nested columns.
If true, interpret all root-level fields as available for usage
Specifies the fields of interest and how they are accessed
Field Types
Root
Root
Refers to a field at the root level of the record. Only useful if
useFieldDiscovery is false.Path
Path
Refers to a field using JsonPath notation. Supported by most formats including
avro, json, orc, and parquet.Example: {"type": "path", "name": "nested", "expr": "$.path.to.nested"}JQ
JQ
Refers to a field using jackson-jq notation. Only supported for the
json format.Example: {"type": "jq", "name": "first_food", "expr": ".thing.food[1]"}Tree
Tree
Refers to a nested field from the root level. More efficient than
path or jq for simple hierarchical fetches. Only supported for json.Example: {"type": "tree", "name": "foo_other_bar", "nodes": ["foo", "other", "bar"]}Compression Formats
Druid supports the following compression formats:gzip
.gz filesbzip2
.bz2 filesxz
.xz fileszip
.zip filesSnappy
.sz filesZSTD
.zst filesNext Steps
Input Sources
Configure where to read your data from
Ingestion Spec
Complete ingestion specification reference
Schema Design
Best practices for schema design
Native Batch
Learn about batch ingestion