Skip to main content
This page describes SQL-based batch ingestion using the multi-stage query (MSQ) task engine, introduced in Druid 24.0. For a programmatic interface, use these APIs. For a user-friendly interface, use the Query view in the web console.
The SQL-based ingestion API allows you to submit INSERT and REPLACE statements to the MSQ task engine for batch data ingestion.

Submit a query

Submits SQL queries to the MSQ task engine for execution.
curl --request POST \
  'http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/task' \
  --header 'Content-Type: application/json' \
  --data '{
  "query": "INSERT INTO wikipedia\nSELECT\n  TIME_PARSE(\"timestamp\") AS __time,\n  *\nFROM TABLE(\n  EXTERN(\n    '\'''{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}'\''',\n    '\'''{\"type\": \"json\"}'\''',\n    '\'''[{\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}]'\'''\n  )\n)\nPARTITIONED BY DAY"
}'
query
string
required
SQL query string. Supports INSERT, REPLACE, and SELECT (experimental) statements.
context
object
Query context parameters for the MSQ engine including:
  • maxNumTasks: Maximum number of parallel tasks
  • finalizeAggregations: Whether to finalize aggregations during ingestion
  • executionMode: Set to “ASYNC” for asynchronous execution
parameters
array
List of query parameters for parameterized queries.
taskId
string
Controller task ID for the submitted query.
state
string
Initial state of the query (typically “RUNNING”).
{
    "taskId": "query-431c4a18-9dde-4ec8-ab82-ec7fd17d5a4e",
    "state": "RUNNING"
}

Get task status

Retrieves the current status of a query task.
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/status"
taskId
string
required
The controller task ID returned when submitting the query.
statusCode
string
Current status: RUNNING, SUCCESS, or FAILED.
duration
integer
Task duration in milliseconds (-1 if still running).
location
object
Host and port information where the task is running.
{
    "task": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e",
    "status": {
        "id": "query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e",
        "type": "query_controller",
        "statusCode": "RUNNING",
        "duration": -1,
        "location": {
            "host": "localhost",
            "port": 8100
        },
        "dataSource": "wikipedia"
    }
}

Get task report

Retrieves the detailed task report including stages, statistics, and results.
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-3dc0c45d-34d7-4b15-86c9-cdb2d3ebfc4e/reports"
taskId
string
required
The controller task ID.
multiStageQuery.payload.status
object
Query execution status including state, start time, duration, and worker information.
multiStageQuery.payload.stages
array
Array of query stage definitions and execution details.
multiStageQuery.payload.counters
object
Execution counters for rows processed, bytes transferred, and other metrics.
multiStageQuery.payload.status.segmentLoadStatus
object
Information about segment loading after ingestion completes.

Report fields

Key fields in the task report:
FieldDescription
status.statusRUNNING, SUCCESS, or FAILED
status.durationMsMilliseconds elapsed since start
status.workersInformation about worker tasks
status.segmentLoadStatus.stateSegment loading state: INIT, WAITING, SUCCESS, FAILED, TIMED_OUT
stages[].stageNumberStage identifier
stages[].phaseStage phase: NEW, READING_INPUT, POST_READING, RESULTS_COMPLETE, FAILED
stages[].workerCountNumber of parallel workers for the stage
countersRow and byte counters per stage and worker

Cancel a query task

Cancels a running query task.
curl --request POST \
  "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/task/query-655efe33-781a-4c50-ae84-c2911b42d63c/shutdown"
taskId
string
required
The task ID to cancel.
{
    "task": "query-655efe33-781a-4c50-ae84-c2911b42d63c"
}

Query types

The MSQ task engine supports:

INSERT statements

Insert data into a new or existing datasource:
INSERT INTO datasource
SELECT columns FROM source
PARTITIONED BY granularity

REPLACE statements

Replace data in a datasource for specific time ranges:
REPLACE INTO datasource OVERWRITE ALL
SELECT columns FROM source  
PARTITIONED BY granularity

SELECT statements (Experimental)

Plain SELECT queries are supported experimentally. Results are written to the task report:
SELECT * FROM datasource WHERE condition
SELECT query results appear in multiStageQuery.payload.results.results as an array of arrays. The format is subject to change.

Context parameters

Important MSQ context parameters:
ParameterTypeDescription
maxNumTasksintegerMaximum number of worker tasks
finalizeAggregationsbooleanFinalize aggregations during ingestion
rowsInMemoryintegerRows to buffer in memory
segmentSortOrderstringColumn sort order for segments
maxParseExceptionsintegerMaximum parse exceptions before failing
executionModestringSet to “ASYNC” for async execution
For a complete list, see the MSQ context parameters documentation.

Error codes

Common error codes in task reports:
Error CodeDescription
TooManyWarningsExceeded warning threshold
CannotParseExternalDataFailed to parse input data
InsertCannotAllocateSegmentCannot allocate segment for INSERT
ColumnTypeNotSupportedUnsupported column type
QueryNotSupportedQuery type not supported by MSQ engine
For all error codes, see the MSQ error codes documentation.

Build docs developers (and LLMs) love