src/QueryEngine.ts is the heart of Claude Code at ~46K lines. It owns the query lifecycle and session state for a single conversation. One QueryEngine instance per conversation; each submitMessage() call starts a new turn within the same session. State (messages, file cache, usage totals) persists across turns.
Core responsibilities
Streaming responses
Consumes the Anthropic streaming API and yields
SDKMessage objects to the caller as they arrive.Tool-call loops
When the LLM emits a
tool_use block, the engine executes the tool and feeds the result back for the next API call — repeating until the model stops requesting tools.Thinking mode
Extended thinking with budget management. Configurable via
ThinkingConfig — budget tokens, whether to enable by default.Retry logic
Automatic retries with backoff for categorized transient API failures via
categorizeRetryableAPIError.Token counting
Tracks input/output tokens and cost per turn using
accumulateUsage / updateUsage from src/services/api/claude.ts.Context management
Manages conversation history and context windows. Integrates with the snip compaction module (
HISTORY_SNIP feature flag) to bound memory in long headless sessions.Configuration
QueryEngine is configured at construction time via QueryEngineConfig:
One
QueryEngine per conversation. Construct it once, then call submitMessage() for each user turn. State accumulates across turns until the instance is discarded.submitMessage() lifecycle
Each call to submitMessage() is an AsyncGenerator that yields SDKMessage objects:
Setup
Clears per-turn state (
discoveredSkillNames), sets the working directory, wraps canUseTool to track permission denials, resolves the model, and builds the system prompt via fetchSystemPromptParts.User input processing
processUserInput normalizes the prompt, injects memory from memdir, and constructs the full message array including conversation history.API call (streaming)
Calls the Anthropic API via
query() in src/query.ts. Responses stream in as partial ContentBlock objects.Tool-call loop
When the model emits a
tool_use block, the engine enters the loop — see Tool invocation lifecycle below. The loop continues until the model produces a stop sequence with no pending tool calls.Tool invocation lifecycle
Every tool call the model requests goes through a strict lifecycle before execution:Schema validation
The tool’s Zod input schema validates the JSON the model provided. Invalid inputs are returned as a
tool_result error message without executing the tool.Permission check
canUseTool(tool, input, toolUseContext, assistantMessage, toolUseID, forceDecision) is called. The CanUseToolFn checks against:- The current
PermissionMode(default, auto, bypassPermissions, etc.) alwaysAllowRulesandalwaysDenyRulesfromToolPermissionContext- Whether the tool is
isConcurrencySafe()
Execution
The tool’s
call(input, context) method runs. The ToolUseContext gives tools access to getAppState, setAppState, abortController, readFileState, setToolJSX, and addNotification.Permission denial tracking
submitMessage wraps the caller-provided canUseTool to accumulate denials:
Streaming response handling
The engine uses an async generator chain throughsrc/query.ts. As text and tool-use blocks stream in:
- Text deltas are accumulated and yielded as partial
SDKMessageobjects whenincludePartialMessagesis enabled. - Tool-use blocks trigger the tool-call loop (see above) before the next API call.
- Thinking blocks are captured and managed separately when
thinkingConfigis active.
Token counting and cost tracking
Each turn’s token usage flows throughsrc/services/api/claude.ts:
src/cost-tracker.ts exposes:
| Export | Description |
|---|---|
getModelUsage() | Per-model token breakdown |
getTotalAPIDuration() | Cumulative API wall time |
getTotalCost() | Estimated USD cost for the session |
/cost command and the status line.
Context management and history
mutableMessages is the in-memory conversation history. It accumulates:
- User messages
- Assistant messages (with
thinkingblocks when relevant) tool_useblocks from the assistanttool_resultblocks returned by tool execution
Snip compaction (HISTORY_SNIP)
For long headless sessions (SDK mode), the HISTORY_SNIP feature flag enables a snip compaction module that truncates history to bound memory:
projectSnippedView. QueryEngine truncates in-place for headless sessions where there is no UI to preserve.
File state cache
readFileCache (FileStateCache) is a per-session LRU cache of file contents read during tool execution. Tools use it to avoid redundant disk reads within a turn.
Thinking mode
Thinking mode is controlled byThinkingConfig:
shouldEnableThinkingByDefault() determines whether thinking is on for a given model. When enabled, the API receives an extended thinking budget and returns thinking blocks that are threaded through the conversation history.
buildTool pattern
Tools passed intoQueryEngine follow the Tool interface defined in src/Tool.ts. A typical tool exposes:
call() is an AsyncGenerator. Tools can yield intermediate ToolProgressData for live UI updates (e.g., streaming bash output) before returning the final result.See also
Architecture Overview
The full pipeline from CLI entrypoint to terminal UI.
State Management & UI
How AppState and the React/Ink UI layer integrate with the Query Engine.