Query Engine

The Query Engine (src/QueryEngine.ts, ~46K lines) is the heart of Claude Code. It owns the complete lifecycle of a conversation: one QueryEngine instance per session, with each call to submitMessage() representing a new turn.

export class QueryEngine {
  private config: QueryEngineConfig
  private mutableMessages: Message[]
  private abortController: AbortController
  private totalUsage: NonNullableUsage
  private readFileState: FileStateCache

  constructor(config: QueryEngineConfig) { ... }

  async *submitMessage(
    prompt: string | ContentBlockParam[],
    options?: { uuid?: string; isMeta?: boolean },
  ): AsyncGenerator<SDKMessage, void, unknown> { ... }
}

State — message history, file cache, cumulative token usage — persists across turns within the same QueryEngine instance.

Configuration

QueryEngineConfig is the full set of dependencies injected at construction time:

export type QueryEngineConfig = {
  cwd: string
  tools: Tools
  commands: Command[]
  mcpClients: MCPServerConnection[]
  agents: AgentDefinition[]
  canUseTool: CanUseToolFn
  getAppState: () => AppState
  setAppState: (f: (prev: AppState) => AppState) => void
  initialMessages?: Message[]
  readFileCache: FileStateCache
  customSystemPrompt?: string
  appendSystemPrompt?: string
  userSpecifiedModel?: string
  fallbackModel?: string
  thinkingConfig?: ThinkingConfig
  maxTurns?: number
  maxBudgetUsd?: number
  verbose?: boolean
  abortController?: AbortController
}

Responsibilities

Streaming responses

submitMessage() is an AsyncGenerator that yields SDKMessage values as they stream from the Anthropic API. Callers iterate the generator to process each streamed chunk — partial text, tool-use blocks, and final stop reasons — as they arrive.The system prompt is assembled at the start of each turn from:

The default system prompt (tool descriptions, context)
An optional customSystemPrompt that replaces the default
An optional appendSystemPrompt appended after the main prompt
Memory mechanics prompts when a memory path override is active

Tool-call loops

When the LLM requests one or more tools, the Query Engine executes them and feeds the results back as new messages, then calls the API again. This loop continues until the LLM emits a final end_turn stop reason or the turn limit is reached.

LLM response (with tool_use blocks)
      ↓
Execute each tool via canUseTool → tool.call()
      ↓
Append tool_result blocks to mutableMessages
      ↓
Call API again with updated messages
      ↓
Repeat until stop_reason === 'end_turn'

Permission denials during tool execution are tracked and reported to SDK callers as SDKPermissionDenial entries.

Thinking mode

Extended thinking is configured via ThinkingConfig:

export type ThinkingConfig =
  | { type: 'disabled' }
  | { type: 'adaptive' }       // Enabled by default when conditions are met
  | { type: 'enabled'; budgetTokens: number }

The Query Engine resolves the effective config at the start of each turn. adaptive mode calls shouldEnableThinkingByDefault() which evaluates model capability and environment signals. When thinking is active, budget tokens are reserved from the context window.

Retry logic

Transient API failures are categorized by categorizeRetryableAPIError() from src/services/api/errors.ts. Retryable errors (rate limits, transient network errors) trigger automatic retries with exponential backoff. Non-retryable errors surface to the caller immediately.

Token counting and cost tracking

Each API response returns usage metadata (input tokens, output tokens, cache read/write tokens). The Query Engine accumulates these into totalUsage across all turns in the session via accumulateUsage() and updateUsage() from src/services/api/claude.ts.Cumulative cost is available via:

import { getModelUsage, getTotalAPIDuration, getTotalCost } from './cost-tracker.js'

The /cost command reads these values and displays them to the user.

Context management

The Query Engine manages mutableMessages — the full conversation history for the session. Before each API call it:

Processes user input through processUserInput() (slash command detection, attachment handling)
Fetches the current system prompt parts via fetchSystemPromptParts()
Takes a file-state snapshot via fileHistoryMakeSnapshot() when file history is enabled
Trims or compacts history when context window limits approach (via the /compact command or automatic compaction)

Turn lifecycle

Process user input

processUserInput() inspects the raw prompt for slash commands, file attachments, and metadata. Slash commands that match a registered LocalCommand or LocalJSXCommand are executed immediately and the turn short-circuits. PromptCommand matches are converted to message content.

Build system prompt

fetchSystemPromptParts() assembles the system prompt from tool descriptions, memory contents, and context. The result is combined with any customSystemPrompt or appendSystemPrompt overrides.

Stream the API response

The assembled messages and system prompt are sent to the Anthropic API via query() (src/query.ts). The generator yields each SDKMessage as it streams in, allowing the terminal UI to render partial output in real time.

Execute tool calls

If the streamed response contains tool_use blocks, each tool is resolved from the registered Tools collection by name via toolMatchesName(). The canUseTool function is called to enforce the permission model before execution. Results are appended to mutableMessages.

Loop or finish

After tool results are appended, the Query Engine calls the API again. This continues until stop_reason === 'end_turn', the maxTurns limit is reached, the budget is exhausted, or the AbortController fires.

Persist session

recordTranscript() and flushSessionStorage() write the updated conversation history to disk so it can be resumed with /resume.

Feature-gated extensions

The Query Engine conditionally imports modules based on build-time feature flags to keep the binary lean:

import { feature } from 'bun:bundle'

// Dead code elimination: conditional import for coordinator mode
const getCoordinatorUserContext = feature('COORDINATOR_MODE')
  ? require('./coordinator/coordinatorMode.js').getCoordinatorUserContext
  : () => ({})

// Dead code elimination: conditional import for snip compaction
const snipModule = feature('HISTORY_SNIP')
  ? require('./services/compact/snipCompact.js')
  : null

Modules imported behind a feature() guard are completely absent from binaries built without that flag — the conditional branch is eliminated at compile time, not at runtime.

Get Started

Architecture

Subsystems

Reference

Configuration

Responsibilities

Turn lifecycle

Feature-gated extensions

See also

Build docs developers (and LLMs) love

Get Started

Architecture

Subsystems

Reference

​Configuration

​Responsibilities

​Turn lifecycle

​Feature-gated extensions

​See also

Build docs developers (and LLMs) love

Configuration

Responsibilities

Turn lifecycle

Feature-gated extensions

See also