SQL-based Ingestion API

This page describes SQL-based batch ingestion using the multi-stage query (MSQ) task engine, introduced in Druid 24.0. For a programmatic interface, use these APIs. For a user-friendly interface, use the Query view in the web console.

The SQL-based ingestion API allows you to submit INSERT and REPLACE statements to the MSQ task engine for batch data ingestion.

Submit a query

Submits SQL queries to the MSQ task engine for execution.

curl --request POST \
  'http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task' \
  --header 'Content-Type: application/json' \
  --data '{
  "query": "INSERT INTO wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\'''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\''',\n    '\'''{\"type\": \"json\"}'\''',\n    '\'''[{\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}]'\'''\n  )\n)\nPARTITIONED BY DAY"
}'

query

string

required

SQL query string. Supports INSERT, REPLACE, and SELECT (experimental) statements.

context

object

Query context parameters for the MSQ engine including:

maxNumTasks: Maximum number of parallel tasks
finalizeAggregations: Whether to finalize aggregations during ingestion
executionMode: Set to “ASYNC” for asynchronous execution

parameters

array

List of query parameters for parameterized queries.

taskId

string

Controller task ID for the submitted query.

state

string

Initial state of the query (typically “RUNNING”).

{
    "taskId": "query-431c4a18-9dde-4ec8-ab82-ec7fd17d5a4e",
    "state": "RUNNING"
}

Get task status

Retrieves the current status of a query task.

curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/status"

taskId

string

required

The controller task ID returned when submitting the query.

statusCode

string

Current status: RUNNING, SUCCESS, or FAILED.

duration

integer

Task duration in milliseconds (-1 if still running).

location

object

Host and port information where the task is running.

{
    "task": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e",
    "status": {
        "id": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e",
        "type": "query_controller",
        "statusCode": "RUNNING",
        "duration": -1,
        "location": {
            "host": "localhost",
            "port": 8100
        },
        "dataSource": "wikipedia"
    }
}

Get task report

Retrieves the detailed task report including stages, statistics, and results.

curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/reports"

taskId

string

required

The controller task ID.

multiStageQuery.payload.status

object

Query execution status including state, start time, duration, and worker information.

multiStageQuery.payload.stages

array

Array of query stage definitions and execution details.

multiStageQuery.payload.counters

object

Execution counters for rows processed, bytes transferred, and other metrics.

multiStageQuery.payload.status.segmentLoadStatus

object

Information about segment loading after ingestion completes.

Report fields

Key fields in the task report:

Field	Description
`status.status`	RUNNING, SUCCESS, or FAILED
`status.durationMs`	Milliseconds elapsed since start
`status.workers`	Information about worker tasks
`status.segmentLoadStatus.state`	Segment loading state: INIT, WAITING, SUCCESS, FAILED, TIMED_OUT
`stages[].stageNumber`	Stage identifier
`stages[].phase`	Stage phase: NEW, READING_INPUT, POST_READING, RESULTS_COMPLETE, FAILED
`stages[].workerCount`	Number of parallel workers for the stage
`counters`	Row and byte counters per stage and worker

Cancel a query task

Cancels a running query task.

curl --request POST \
  "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-655efe33-781a-4c50-ae84-c2911b42d63c/shutdown"

taskId

string

required

The task ID to cancel.

{
    "task": "query-655efe33-781a-4c50-ae84-c2911b42d63c"
}

Query types

The MSQ task engine supports:

INSERT statements

Insert data into a new or existing datasource:

INSERT INTO datasource
SELECT columns FROM source
PARTITIONED BY granularity

REPLACE statements

Replace data in a datasource for specific time ranges:

REPLACE INTO datasource OVERWRITE ALL
SELECT columns FROM source  
PARTITIONED BY granularity

SELECT statements (Experimental)

Plain SELECT queries are supported experimentally. Results are written to the task report:

SELECT * FROM datasource WHERE condition

SELECT query results appear in multiStageQuery.payload.results.results as an array of arrays. The format is subject to change.

Context parameters

Important MSQ context parameters:

Parameter	Type	Description
`maxNumTasks`	integer	Maximum number of worker tasks
`finalizeAggregations`	boolean	Finalize aggregations during ingestion
`rowsInMemory`	integer	Rows to buffer in memory
`segmentSortOrder`	string	Column sort order for segments
`maxParseExceptions`	integer	Maximum parse exceptions before failing
`executionMode`	string	Set to “ASYNC” for async execution

For a complete list, see the MSQ context parameters documentation.

Error codes

Common error codes in task reports:

Error Code	Description
`TooManyWarnings`	Exceeded warning threshold
`CannotParseExternalData`	Failed to parse input data
`InsertCannotAllocateSegment`	Cannot allocate segment for INSERT
`ColumnTypeNotSupported`	Unsupported column type
`QueryNotSupported`	Query type not supported by MSQ engine

For all error codes, see the MSQ error codes documentation.

HTTP APIs

SQL & JDBC

SQL-based Ingestion API

Submit a query

Get task status

Get task report

Report fields

Cancel a query task

Query types

INSERT statements

REPLACE statements

SELECT statements (Experimental)

Context parameters

Error codes

Build docs developers (and LLMs) love

HTTP APIs

SQL & JDBC

​Submit a query

​Get task status

​Get task report

​Report fields

​Cancel a query task

​Query types

​INSERT statements

​REPLACE statements

​SELECT statements (Experimental)

​Context parameters

​Error codes

Build docs developers (and LLMs) love

Submit a query

Get task status

Get task report

Report fields

Cancel a query task

Query types

INSERT statements

REPLACE statements

SELECT statements (Experimental)

Context parameters

Error codes