SDD has two concepts that often get conflated: a skill (what the AI knows how to do) and an agent (how it’s executed). Understanding the difference clarifies why the workflow is structured the way it is, why some phases feel “heavier” than others, and how context is managed across a long-running change. The split is intentional — it is the mechanism that keeps context clean, enables model selection, and makes prompt caching possible.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jorgeferrando/sdd-skills/llms.txt
Use this file to discover all available pages before exploring further.
Definitions
Skill
A skill is a markdown instruction file with YAML frontmatter. It lives inskills/sdd-{name}/instructions.md and tells the AI how to perform one step of the workflow.
model_hint field signals to orchestrators which model tier to use when spawning a subagent for this skill.
Agent
An agent is a running AI instance with its own conversation context. There are two kinds in SDD:| Kind | Context | Model | Example |
|---|---|---|---|
| Orchestrator | Your main conversation | Whatever you picked in the client | The one you’re talking to right now |
| Subagent | Fresh, isolated | Chosen via model_hint | Spawned by /sdd-apply per task |
Inline vs subagent execution
Most SDD skills run inline — you invoke the slash command, the AI loads the skill’s instructions, and executes them in your current conversation. Four skills use a different pattern:| Skill | Mode | Why |
|---|---|---|
/sdd-design | Subagent | Design analysis is self-contained; isolating it keeps the main context clean |
/sdd-apply | Orchestrator + one subagent per task | Each task implementation needs full file-reading context; running inline would bloat the main conversation |
/sdd-verify | Subagent | Runs tests, linters, smoke checks — produces a report, no interactive decisions needed |
/sdd-discover | Parallel subagents | Domain detection fan-out — each subagent analyzes one domain simultaneously |
propose, spec, tasks, archive, audit, steer, init, new, ff, continue, recall, docs — runs inline in your current conversation.
Why this split matters
Context hygiene
The main conversation is finite (around 200K tokens effective). If/sdd-apply ran inline and read every file for every task, the context would fill fast and quality would degrade. By spawning one subagent per task, each task gets a fresh, focused context — the orchestrator only sees the returned summary, not all the file contents that went into producing it.
Model selection
Themodel_hint in each skill tells orchestrators (sdd-agent, sdd-ff, sdd-continue, sdd-apply) which tier to spawn subagents on:
opus— judgment-heavy phases: propose, designsonnet— comprehension-heavy phases: explore, spec, apply (per-task subagents), verify, audit, steer, init, new, ff, discover, agenthaiku— mechanical phases: tasks, archive, recall, docs, continue (dispatcher), apply (orchestrator)
/sdd-apply is a good example: the orchestrator runs on haiku (it just tracks task state and dispatches), while each per-task subagent runs on sonnet (it writes real code).
Prompt caching
Because subagents share a fixed prompt prefix — steering content loaded once by the orchestrator and passed inline — sequential subagents benefit from LLM prompt caching (5-minute TTL) across back-to-back task runs. See Token Optimization: Prompt Caching for details.Mental model
The orchestrator is always the one talking to you. Subagents are short-lived workers whose output is a report, not a conversation.Practical implications
Three consequences of this architecture matter in day-to-day use: Clearing context (/clear, new session) affects the orchestrator, not past subagents. Subagents don’t persist — they’re already gone by the time you see their summary. Clearing the orchestrator context is safe at any phase boundary.
Interactive questions (proposals, design decisions, task review) must happen in the orchestrator. A subagent cannot ask you anything mid-run — it either completes its task or reports a blocker back to the orchestrator, which then surfaces it to you.
/sdd-continue works from fresh sessions because it detects the current phase from artifacts on disk, not from conversation history. You can start a brand-new conversation at any point in the workflow and /sdd-continue will pick up exactly where you left off. See Token Optimization: When to Clear Context for the recommended clearing schedule.