Skills vs Agents: How SDD Execution Works

SDD has two concepts that often get conflated: a skill (what the AI knows how to do) and an agent (how it’s executed). Understanding the difference clarifies why the workflow is structured the way it is, why some phases feel “heavier” than others, and how context is managed across a long-running change. The split is intentional — it is the mechanism that keeps context clean, enables model selection, and makes prompt caching possible.

Definitions

Skill

A skill is a markdown instruction file with YAML frontmatter. It lives in skills/sdd-{name}/instructions.md and tells the AI how to perform one step of the workflow.

---
name: sdd-spec
description: SDD Spec - write behavior specs. Usage - /sdd-spec or /sdd-spec {domain}.
model_hint: sonnet
---

# SDD Spec
...

Skills are static content. They don’t run on their own — they are loaded and followed by an AI. The model_hint field signals to orchestrators which model tier to use when spawning a subagent for this skill.

Agent

An agent is a running AI instance with its own conversation context. There are two kinds in SDD:

Kind	Context	Model	Example
Orchestrator	Your main conversation	Whatever you picked in the client	The one you’re talking to right now
Subagent	Fresh, isolated	Chosen via `model_hint`	Spawned by `/sdd-apply` per task

A subagent starts with no memory of your conversation. The orchestrator hands it a self-contained prompt — instructions plus all the context it needs — the subagent runs, and it returns a summary. Its context is then discarded.

Inline vs subagent execution

Most SDD skills run inline — you invoke the slash command, the AI loads the skill’s instructions, and executes them in your current conversation. Four skills use a different pattern:

Skill	Mode	Why
`/sdd-design`	Subagent	Design analysis is self-contained; isolating it keeps the main context clean
`/sdd-apply`	Orchestrator + one subagent per task	Each task implementation needs full file-reading context; running inline would bloat the main conversation
`/sdd-verify`	Subagent	Runs tests, linters, smoke checks — produces a report, no interactive decisions needed
`/sdd-discover`	Parallel subagents	Domain detection fan-out — each subagent analyzes one domain simultaneously

Everything else — propose, spec, tasks, archive, audit, steer, init, new, ff, continue, recall, docs — runs inline in your current conversation.

Why this split matters

Context hygiene

The main conversation is finite (around 200K tokens effective). If /sdd-apply ran inline and read every file for every task, the context would fill fast and quality would degrade. By spawning one subagent per task, each task gets a fresh, focused context — the orchestrator only sees the returned summary, not all the file contents that went into producing it.

Model selection

The model_hint in each skill tells orchestrators (sdd-agent, sdd-ff, sdd-continue, sdd-apply) which tier to spawn subagents on:

opus — judgment-heavy phases: propose, design
sonnet — comprehension-heavy phases: explore, spec, apply (per-task subagents), verify, audit, steer, init, new, ff, discover, agent
haiku — mechanical phases: tasks, archive, recall, docs, continue (dispatcher), apply (orchestrator)

The orchestrator itself may run on a different model than its subagents. /sdd-apply is a good example: the orchestrator runs on haiku (it just tracks task state and dispatches), while each per-task subagent runs on sonnet (it writes real code).

Prompt caching

Because subagents share a fixed prompt prefix — steering content loaded once by the orchestrator and passed inline — sequential subagents benefit from LLM prompt caching (5-minute TTL) across back-to-back task runs. See Token Optimization: Prompt Caching for details.

Mental model

The orchestrator is always the one talking to you. Subagents are short-lived workers whose output is a report, not a conversation.

Practical implications

Three consequences of this architecture matter in day-to-day use: Clearing context (/clear, new session) affects the orchestrator, not past subagents. Subagents don’t persist — they’re already gone by the time you see their summary. Clearing the orchestrator context is safe at any phase boundary. Interactive questions (proposals, design decisions, task review) must happen in the orchestrator. A subagent cannot ask you anything mid-run — it either completes its task or reports a blocker back to the orchestrator, which then surfaces it to you. /sdd-continue works from fresh sessions because it detects the current phase from artifacts on disk, not from conversation history. You can start a brand-new conversation at any point in the workflow and /sdd-continue will pick up exactly where you left off. See Token Optimization: When to Clear Context for the recommended clearing schedule.

Get Started

Core Concepts

Specialists

Examples & Guides

Skills vs Agents: How SDD Execution Works

Definitions

Skill

Agent

Inline vs subagent execution

Why this split matters

Context hygiene

Model selection

Prompt caching

Mental model

Practical implications

Build docs developers (and LLMs) love

Get Started

Core Concepts

Specialists

Examples & Guides

Documentation Index

​Definitions

​Skill

​Agent

​Inline vs subagent execution

​Why this split matters

​Context hygiene

​Model selection

​Prompt caching

​Mental model

​Practical implications

Build docs developers (and LLMs) love

Definitions

Skill

Agent

Inline vs subagent execution

Why this split matters

Context hygiene

Model selection

Prompt caching

Mental model

Practical implications