src/QueryEngine.ts, ~46K lines) is the heart of Claude Code. It owns the complete lifecycle of a conversation: one QueryEngine instance per session, with each call to submitMessage() representing a new turn.
QueryEngine instance.
Configuration
QueryEngineConfig is the full set of dependencies injected at construction time:
Responsibilities
Streaming responses
Streaming responses
submitMessage() is an AsyncGenerator that yields SDKMessage values as they stream from the Anthropic API. Callers iterate the generator to process each streamed chunk — partial text, tool-use blocks, and final stop reasons — as they arrive.The system prompt is assembled at the start of each turn from:- The default system prompt (tool descriptions, context)
- An optional
customSystemPromptthat replaces the default - An optional
appendSystemPromptappended after the main prompt - Memory mechanics prompts when a memory path override is active
Tool-call loops
Tool-call loops
When the LLM requests one or more tools, the Query Engine executes them and feeds the results back as new messages, then calls the API again. This loop continues until the LLM emits a final Permission denials during tool execution are tracked and reported to SDK callers as
end_turn stop reason or the turn limit is reached.SDKPermissionDenial entries.Thinking mode
Thinking mode
Extended thinking is configured via The Query Engine resolves the effective config at the start of each turn.
ThinkingConfig:adaptive mode calls shouldEnableThinkingByDefault() which evaluates model capability and environment signals. When thinking is active, budget tokens are reserved from the context window.Retry logic
Retry logic
Transient API failures are categorized by
categorizeRetryableAPIError() from src/services/api/errors.ts. Retryable errors (rate limits, transient network errors) trigger automatic retries with exponential backoff. Non-retryable errors surface to the caller immediately.Token counting and cost tracking
Token counting and cost tracking
Each API response returns usage metadata (input tokens, output tokens, cache read/write tokens). The Query Engine accumulates these into The
totalUsage across all turns in the session via accumulateUsage() and updateUsage() from src/services/api/claude.ts.Cumulative cost is available via:/cost command reads these values and displays them to the user.Context management
Context management
The Query Engine manages
mutableMessages — the full conversation history for the session. Before each API call it:- Processes user input through
processUserInput()(slash command detection, attachment handling) - Fetches the current system prompt parts via
fetchSystemPromptParts() - Takes a file-state snapshot via
fileHistoryMakeSnapshot()when file history is enabled - Trims or compacts history when context window limits approach (via the
/compactcommand or automatic compaction)
Turn lifecycle
Process user input
processUserInput() inspects the raw prompt for slash commands, file attachments, and metadata. Slash commands that match a registered LocalCommand or LocalJSXCommand are executed immediately and the turn short-circuits. PromptCommand matches are converted to message content.Build system prompt
fetchSystemPromptParts() assembles the system prompt from tool descriptions, memory contents, and context. The result is combined with any customSystemPrompt or appendSystemPrompt overrides.Stream the API response
The assembled messages and system prompt are sent to the Anthropic API via
query() (src/query.ts). The generator yields each SDKMessage as it streams in, allowing the terminal UI to render partial output in real time.Execute tool calls
If the streamed response contains
tool_use blocks, each tool is resolved from the registered Tools collection by name via toolMatchesName(). The canUseTool function is called to enforce the permission model before execution. Results are appended to mutableMessages.Loop or finish
After tool results are appended, the Query Engine calls the API again. This continues until
stop_reason === 'end_turn', the maxTurns limit is reached, the budget is exhausted, or the AbortController fires.Feature-gated extensions
The Query Engine conditionally imports modules based on build-time feature flags to keep the binary lean:Modules imported behind a
feature() guard are completely absent from binaries built without that flag — the conditional branch is eliminated at compile time, not at runtime.See also
- Architecture overview — How the Query Engine fits into the full pipeline
- Tool System — How tools are defined and invoked
- Command System — How slash commands interact with the Query Engine