Token Optimization: Cost-Efficient SDD Workflows

SDD includes several strategies to minimize token usage and cost across the workflow. These strategies work together — model selection reduces cost per token, while context management reduces the number of tokens processed in the first place. Neither alone is sufficient; the combination is what makes a long, multi-phase workflow economically viable.

Model selection

Each skill declares the minimum model tier needed in its model_hint frontmatter field. Orchestrators (sdd-agent, sdd-ff, sdd-continue) pass this hint when spawning subagents, so the right model is used for each job automatically.

Hint	Use for	Skills
`opus`	Judgment-heavy phases: design decisions, solution analysis	`sdd-propose`, `sdd-design`
`sonnet`	Code comprehension: analysis, spec writing, implementation	`sdd-explore`, `sdd-spec`, `sdd-apply` (subagents), `sdd-verify`, `sdd-audit`, `sdd-steer`, `sdd-init`, `sdd-new`, `sdd-ff`, `sdd-discover`, `sdd-agent`
`haiku`	Mechanical phases: template-filling, search, dispatch	`sdd-tasks`, `sdd-archive`, `sdd-recall`, `sdd-docs`, `sdd-continue`, `sdd-apply` (orchestrator)

Using opus only for propose and design (the judgment phases) while running mechanical phases on haiku can reduce cost by 60–70% compared to running everything on a single high-tier model.

Context management

The artifact chain

Each SDD phase produces a file that captures all decisions made during that phase. Once the phase completes, the conversation context is redundant with the artifact — everything the AI discovered or decided is now on disk:

explore  → notes.md       (findings)
propose  → proposal.md    (scope decisions)
spec     → spec.md        (behavior)
design   → design.md      (architecture)
tasks    → tasks.md       (execution plan)
apply    → commits        (code)
verify   → PR             (result)

This chain is the basis for the context-clearing strategy.

When to clear context

Moment	Clear?	Reason
Between explore and propose	No	Coupled — exploration feeds proposal questions
After propose	Yes	`proposal.md` captures everything
After spec	Yes	`spec.md` captures everything
After design	Yes	`design.md` captures everything
After tasks	Yes (most important)	Apply is the longest phase — entering clean saves the most
During apply	No	Subagents already isolate context per task
After verify	Yes	PR created, everything captured

Why this matters

If context is 50K tokens after the propose + spec phases and 15 apply turns remain, that is 50K × 15 = 750K tokens of input carrying stale context that the subagents don’t need. Clearing after tasks and re-reading the artifacts (~5K tokens total) eliminates that cost.

Why /sdd-continue makes this natural

/sdd-continue detects the current phase from artifacts on disk, not from conversation history. You can clear context, start a new session, run /sdd-continue, and the workflow resumes exactly where it left off. Clearing becomes a zero-friction operation rather than a disruptive reset.

Selective steering loading

Skills that read openspec/steering/ load only the specialist files relevant to the current task, not every .md file in the directory. Selection is based on the files the task touches:

Specialists with applies_to: all in their manifest → always loaded
conventions-testing.md → only when the task touches test files
conventions-security.md → only when the task touches auth, API, or input-handling files
Other specialists → only when the file matches the specialist’s declared domain

With five or more specialists installed, this reduces steering context from roughly 8KB to roughly 3KB per subagent — a meaningful reduction when multiplied across many tasks.

Prompt caching

Orchestrator skills (sdd-apply, sdd-agent) read steering files once and pass the content inline to subagent prompts. This creates a fixed prefix that is identical across every task in an apply run. LLM prompt caches (5-minute TTL on Claude) hit on this prefix for every sequential subagent, so the steering content is only billed once. The same strategy applies to sdd-discover, which uses an identical prompt prefix across all parallel domain-analysis subagents.

Output style

All skills include a terse output directive. Status reports use tables and single-line bullets instead of prose paragraphs. This reduces response token count — which also has a cost — without losing information density.

English artifacts

All generated artifacts (proposal.md, spec.md, design.md, tasks.md, notes.md) are written in English regardless of the user’s preferred language. English uses approximately 30% fewer tokens than Romance languages (Spanish, French, Portuguese) for the same semantic content, so this is a consistent per-artifact saving across the entire workflow.

Get Started

Core Concepts

Specialists

Examples & Guides

Token Optimization: Cost-Efficient SDD Workflows

Model selection

Context management

The artifact chain

When to clear context

Why this matters

Why /sdd-continue makes this natural

Selective steering loading

Prompt caching

Output style

English artifacts

Build docs developers (and LLMs) love

Get Started

Core Concepts

Specialists

Examples & Guides

Documentation Index

​Model selection

​Context management

​The artifact chain

​When to clear context

​Why this matters

​Why /sdd-continue makes this natural

​Selective steering loading

​Prompt caching

​Output style

​English artifacts

Build docs developers (and LLMs) love

Model selection

Context management

The artifact chain

When to clear context

Why this matters

Why /sdd-continue makes this natural

Selective steering loading

Prompt caching

Output style

English artifacts