What spec.md must contain
A frozen spec.md must include at minimum:
- Original task statement — preserved verbatim or summarized accurately from the user request
- Acceptance criteria — explicit, labeled
AC1,AC2,AC3, … Each criterion must be independently verifiable - Constraints — technical or process constraints the implementation must respect
- Non-goals — what is explicitly out of scope for this task
- Repo guidance sources consulted during freeze
- A verification plan
- Assumptions resolved narrowly from the user request
Every criterion must carry a label (
AC1, AC2, …). Downstream steps — evidence packing, verification, and fix guidance — all reference criteria by these labels. Unlabeled criteria cannot be tracked independently.The no-production-code rule
During spec freeze, the freezer may:- Read repo guidance files (
AGENTS.md,CLAUDE.md,.claude/CLAUDE.md,.claude/rules/**/*.md) - Read the minimum relevant code needed to understand the task
- Write or update
.agent/tasks/<TASK_ID>/spec.md
- Change production code
- Write
evidence.md,evidence.json,verdict.json, orproblems.md
Resolving assumptions narrowly
When the user request is ambiguous, assumptions must be resolved narrowly — choosing the interpretation that requires less implementation, not more. Each assumption must be listed explicitly inspec.md so the verifier can evaluate whether the implementation matched the stated scope.
Verification plan
The spec may include a concise verification plan: the commands, tools, or checks that a fresh verifier should run to independently confirm each criterion. This plan is advisory — the verifier makes its own judgment — but a well-written verification plan reduces the chance that a criterion is markedUNKNOWN due to ambiguous test coverage.
Evidence packing requirements
After implementation, the builder packs evidence against each acceptance criterion. The evidence must:- Judge each criterion independently with exactly one of
PASS,FAIL, orUNKNOWN - For every
PASS, cite concrete proof:- File paths
- Commands run
- Exit codes
- Output summaries
- Artifact paths under
raw/
- For every
FAILorUNKNOWN, explain the gap
The builder must not claim
overall_status: PASS in evidence.json unless every acceptance criterion is PASS. A single FAIL or UNKNOWN criterion means the overall status is not PASS.Fresh verification requirements
After evidence is packed, a fresh verifier — a new session or subagent that did not participate in implementation — performs an independent pass. The verifier writes two files:verdict.json— always written; contains per-criterion verdicts andoverall_verdictproblems.md— written only whenoverall_verdictis notPASS; contains per-criterion fix guidance
- Modify production code
- Backfill or patch the evidence bundle to make it look complete
problems.md requirements
For each non-PASS criterion, problems.md must include:
- Criterion id and text
- Status
- Why it is not proven
- Minimal reproduction steps
- Expected vs actual
- Affected files
- Smallest safe fix
- Corrective hint in 1–3 sentences
The fix loop
If the verdict is notPASS, a fresh fixer reads only spec.md, verdict.json, and problems.md. It:
- Reconfirms each listed problem in the codebase before editing
- Makes the smallest safe change set
- Avoids regressing already-passing criteria
- Regenerates
evidence.md,evidence.json, and raw artifacts - Does not write
verdict.jsonor claim final sign-off
overall_verdict is PASS.
The workflow is considered complete only when a fresh verifier — one that did not participate in the fix — returns
overall_verdict: PASS. The fixer never self-certifies completion.