SDD includes several strategies to minimize token usage and cost across the workflow. These strategies work together — model selection reduces cost per token, while context management reduces the number of tokens processed in the first place. Neither alone is sufficient; the combination is what makes a long, multi-phase workflow economically viable.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jorgeferrando/sdd-skills/llms.txt
Use this file to discover all available pages before exploring further.
Model selection
Each skill declares the minimum model tier needed in itsmodel_hint frontmatter field. Orchestrators (sdd-agent, sdd-ff, sdd-continue) pass this hint when spawning subagents, so the right model is used for each job automatically.
| Hint | Use for | Skills |
|---|---|---|
opus | Judgment-heavy phases: design decisions, solution analysis | sdd-propose, sdd-design |
sonnet | Code comprehension: analysis, spec writing, implementation | sdd-explore, sdd-spec, sdd-apply (subagents), sdd-verify, sdd-audit, sdd-steer, sdd-init, sdd-new, sdd-ff, sdd-discover, sdd-agent |
haiku | Mechanical phases: template-filling, search, dispatch | sdd-tasks, sdd-archive, sdd-recall, sdd-docs, sdd-continue, sdd-apply (orchestrator) |
Context management
The artifact chain
Each SDD phase produces a file that captures all decisions made during that phase. Once the phase completes, the conversation context is redundant with the artifact — everything the AI discovered or decided is now on disk:When to clear context
| Moment | Clear? | Reason |
|---|---|---|
| Between explore and propose | No | Coupled — exploration feeds proposal questions |
| After propose | Yes | proposal.md captures everything |
| After spec | Yes | spec.md captures everything |
| After design | Yes | design.md captures everything |
| After tasks | Yes (most important) | Apply is the longest phase — entering clean saves the most |
| During apply | No | Subagents already isolate context per task |
| After verify | Yes | PR created, everything captured |
Why this matters
If context is 50K tokens after the propose + spec phases and 15 apply turns remain, that is 50K × 15 = 750K tokens of input carrying stale context that the subagents don’t need. Clearing after tasks and re-reading the artifacts (~5K tokens total) eliminates that cost.Why /sdd-continue makes this natural
/sdd-continue detects the current phase from artifacts on disk, not from conversation history. You can clear context, start a new session, run /sdd-continue, and the workflow resumes exactly where it left off. Clearing becomes a zero-friction operation rather than a disruptive reset.
Selective steering loading
Skills that readopenspec/steering/ load only the specialist files relevant to the current task, not every .md file in the directory. Selection is based on the files the task touches:
- Specialists with
applies_to: allin their manifest → always loaded conventions-testing.md→ only when the task touches test filesconventions-security.md→ only when the task touches auth, API, or input-handling files- Other specialists → only when the file matches the specialist’s declared domain
Prompt caching
Orchestrator skills (sdd-apply, sdd-agent) read steering files once and pass the content inline to subagent prompts. This creates a fixed prefix that is identical across every task in an apply run. LLM prompt caches (5-minute TTL on Claude) hit on this prefix for every sequential subagent, so the steering content is only billed once.
The same strategy applies to sdd-discover, which uses an identical prompt prefix across all parallel domain-analysis subagents.
Output style
All skills include a terse output directive. Status reports use tables and single-line bullets instead of prose paragraphs. This reduces response token count — which also has a cost — without losing information density.English artifacts
All generated artifacts (proposal.md, spec.md, design.md, tasks.md, notes.md) are written in English regardless of the user’s preferred language. English uses approximately 30% fewer tokens than Romance languages (Spanish, French, Portuguese) for the same semantic content, so this is a consistent per-artifact saving across the entire workflow.