Query Engine

src/QueryEngine.ts is the heart of Claude Code at ~46K lines. It owns the query lifecycle and session state for a single conversation. One QueryEngine instance per conversation; each submitMessage() call starts a new turn within the same session. State (messages, file cache, usage totals) persists across turns.

                 ┌──────────────────────────────────────────┐
                 │             QueryEngine                  │
                 │                                          │
  submitMessage ─▶  buildSystemPrompt                       │
                 │       │                                  │
                 │       ▼                                  │
                 │  processUserInput ──▶ Anthropic API      │
                 │       │               (streaming)        │
                 │       ▼                                  │
                 │  ┌─────────────────────────────┐         │
                 │  │       Tool-Call Loop        │         │
                 │  │  tool_use block received    │         │
                 │  │       │                     │         │
                 │  │  canUseTool() check         │         │
                 │  │       │                     │         │
                 │  │  tool.call(input, context)  │         │
                 │  │       │                     │         │
                 │  │  feed result back → API     │         │
                 │  └─────────────────────────────┘         │
                 │       │                                  │
                 │       ▼                                  │
                 │  yield SDKMessage (stream to caller)     │
                 └──────────────────────────────────────────┘

Core responsibilities

Streaming responses

Consumes the Anthropic streaming API and yields SDKMessage objects to the caller as they arrive.

Tool-call loops

When the LLM emits a tool_use block, the engine executes the tool and feeds the result back for the next API call — repeating until the model stops requesting tools.

Thinking mode

Extended thinking with budget management. Configurable via ThinkingConfig — budget tokens, whether to enable by default.

Retry logic

Automatic retries with backoff for categorized transient API failures via categorizeRetryableAPIError.

Token counting

Tracks input/output tokens and cost per turn using accumulateUsage / updateUsage from src/services/api/claude.ts.

Context management

Manages conversation history and context windows. Integrates with the snip compaction module (HISTORY_SNIP feature flag) to bound memory in long headless sessions.

Configuration

QueryEngine is configured at construction time via QueryEngineConfig:

// src/QueryEngine.ts
export type QueryEngineConfig = {
  cwd: string
  tools: Tools
  commands: Command[]
  mcpClients: MCPServerConnection[]
  agents: AgentDefinition[]
  canUseTool: CanUseToolFn
  getAppState: () => AppState
  setAppState: (f: (prev: AppState) => AppState) => void
  initialMessages?: Message[]
  readFileCache: FileStateCache
  customSystemPrompt?: string
  appendSystemPrompt?: string
  userSpecifiedModel?: string
  fallbackModel?: string
  thinkingConfig?: ThinkingConfig
  maxTurns?: number
  maxBudgetUsd?: number
  taskBudget?: { total: number }
  jsonSchema?: Record<string, unknown>
  verbose?: boolean
  replayUserMessages?: boolean
  includePartialMessages?: boolean
  setSDKStatus?: (status: SDKStatus) => void
  abortController?: AbortController
  orphanedPermission?: OrphanedPermission
}

One QueryEngine per conversation. Construct it once, then call submitMessage() for each user turn. State accumulates across turns until the instance is discarded.

`submitMessage()` lifecycle

Each call to submitMessage() is an AsyncGenerator that yields SDKMessage objects:

// src/QueryEngine.ts
async *submitMessage(
  prompt: string | ContentBlockParam[],
  options?: { uuid?: string; isMeta?: boolean },
): AsyncGenerator<SDKMessage, void, unknown>

Setup

Clears per-turn state (discoveredSkillNames), sets the working directory, wraps canUseTool to track permission denials, resolves the model, and builds the system prompt via fetchSystemPromptParts.

User input processing

processUserInput normalizes the prompt, injects memory from memdir, and constructs the full message array including conversation history.

API call (streaming)

Calls the Anthropic API via query() in src/query.ts. Responses stream in as partial ContentBlock objects.

Tool-call loop

When the model emits a tool_use block, the engine enters the loop — see Tool invocation lifecycle below. The loop continues until the model produces a stop sequence with no pending tool calls.

Yield to caller

Assembled SDKMessage objects are yielded upstream to the REPL or SDK consumer.

Session persistence

recordTranscript and flushSessionStorage write the turn to disk (unless sessionPersistenceDisabled).

Tool invocation lifecycle

Every tool call the model requests goes through a strict lifecycle before execution:

Schema validation

The tool’s Zod input schema validates the JSON the model provided. Invalid inputs are returned as a tool_result error message without executing the tool.

Permission check

canUseTool(tool, input, toolUseContext, assistantMessage, toolUseID, forceDecision) is called. The CanUseToolFn checks against:

The current PermissionMode (default, auto, bypassPermissions, etc.)
alwaysAllowRules and alwaysDenyRules from ToolPermissionContext
Whether the tool is isConcurrencySafe()

Execution

The tool’s call(input, context) method runs. The ToolUseContext gives tools access to getAppState, setAppState, abortController, readFileState, setToolJSX, and addNotification.

Result feed-back

The tool result is appended to mutableMessages as a tool_result block and the API is called again with the full updated history.

Permission denial tracking

submitMessage wraps the caller-provided canUseTool to accumulate denials:

// src/QueryEngine.ts
const wrappedCanUseTool: CanUseToolFn = async (
  tool,
  input,
  toolUseContext,
  assistantMessage,
  toolUseID,
  forceDecision,
) => {
  // ... delegates to config.canUseTool, records SDKPermissionDenial on deny
}

Denials are surfaced back to the SDK consumer so calling code can inspect which tools were blocked.

Streaming response handling

The engine uses an async generator chain through src/query.ts. As text and tool-use blocks stream in:

Text deltas are accumulated and yielded as partial SDKMessage objects when includePartialMessages is enabled.
Tool-use blocks trigger the tool-call loop (see above) before the next API call.
Thinking blocks are captured and managed separately when thinkingConfig is active.

The includePartialMessages flag controls whether the caller sees partial streaming tokens or only complete turns. The REPL sets this to true to render tokens as they arrive; SDK consumers typically set it to false.

Token counting and cost tracking

Each turn’s token usage flows through src/services/api/claude.ts:

// Accumulate across turns
accumulateUsage(totalUsage, turnUsage)

// Update global cost tracker
updateUsage(model, turnUsage)

The global cost tracker in src/cost-tracker.ts exposes:

Export	Description
`getModelUsage()`	Per-model token breakdown
`getTotalAPIDuration()`	Cumulative API wall time
`getTotalCost()`	Estimated USD cost for the session

These are surfaced in the REPL via the /cost command and the status line.

Context management and history

mutableMessages is the in-memory conversation history. It accumulates:

User messages
Assistant messages (with thinking blocks when relevant)
tool_use blocks from the assistant
tool_result blocks returned by tool execution

Snip compaction (`HISTORY_SNIP`)

For long headless sessions (SDK mode), the HISTORY_SNIP feature flag enables a snip compaction module that truncates history to bound memory:

// src/QueryEngine.ts — feature-gated import
const snipModule = feature('HISTORY_SNIP')
  ? (require('./services/compact/snipCompact.js') as typeof import('./services/compact/snipCompact.js'))
  : null

The REPL keeps full history for UI scrollback and compacts on-demand via projectSnippedView. QueryEngine truncates in-place for headless sessions where there is no UI to preserve.

File state cache

readFileCache (FileStateCache) is a per-session LRU cache of file contents read during tool execution. Tools use it to avoid redundant disk reads within a turn.

Thinking mode

Thinking mode is controlled by ThinkingConfig:

// src/utils/thinking.ts
export type ThinkingConfig = {
  enabled: boolean
  budgetTokens: number
}

shouldEnableThinkingByDefault() determines whether thinking is on for a given model. When enabled, the API receives an extended thinking budget and returns thinking blocks that are threaded through the conversation history.

buildTool pattern

Tools passed into QueryEngine follow the Tool interface defined in src/Tool.ts. A typical tool exposes:

// Pattern from src/Tool.ts
type Tool = {
  name: string
  description: string
  inputSchema: ToolInputJSONSchema    // JSON Schema for LLM
  inputJSONSchema?: z.ZodType         // Zod schema for validation
  call(input: unknown, context: ToolUseContext): AsyncGenerator<ToolProgressData, ToolResult>
  isConcurrencySafe(): boolean
  isReadOnly(): boolean
  renderResultForAssistant(result: ToolResult): string
  // Optional UI rendering
  renderToolUse?: (input: unknown, context: ToolUseContext) => React.ReactNode
  renderToolResult?: (result: ToolResult) => React.ReactNode
}

call() is an AsyncGenerator. Tools can yield intermediate ToolProgressData for live UI updates (e.g., streaming bash output) before returning the final result.

Architecture Overview

The full pipeline from CLI entrypoint to terminal UI.

State Management & UI

How AppState and the React/Ink UI layer integrate with the Query Engine.

Get Started

Architecture

Subsystems

Guides

Core responsibilities

Streaming responses

Tool-call loops

Thinking mode

Retry logic

Token counting

Context management

Configuration

`submitMessage()` lifecycle

Tool invocation lifecycle

Permission denial tracking

Streaming response handling

Token counting and cost tracking

Context management and history

Snip compaction (`HISTORY_SNIP`)

File state cache

Thinking mode

buildTool pattern

See also

Architecture Overview

State Management & UI

Build docs developers (and LLMs) love

Get Started

Architecture

Subsystems

Guides

​Core responsibilities

Streaming responses

Tool-call loops

Thinking mode

Retry logic

Token counting

Context management

​Configuration

​submitMessage() lifecycle

​Tool invocation lifecycle

​Permission denial tracking

​Streaming response handling

​Token counting and cost tracking

​Context management and history

​Snip compaction (HISTORY_SNIP)

​File state cache

​Thinking mode

​buildTool pattern

​See also

Architecture Overview

State Management & UI

Build docs developers (and LLMs) love

Core responsibilities

Configuration

`submitMessage()` lifecycle

Tool invocation lifecycle

Permission denial tracking

Streaming response handling

Token counting and cost tracking

Context management and history

Snip compaction (`HISTORY_SNIP`)

File state cache

Thinking mode

buildTool pattern

See also