Skip to main content
Spec freeze is the step where the task is written down as a precise, unambiguous contract before any implementation begins. The frozen spec is the single source of truth that every subsequent step — build, evidence, verify, fix — is measured against.

What spec.md must contain

A frozen spec.md must include at minimum:
  • Original task statement — preserved verbatim or summarized accurately from the user request
  • Acceptance criteria — explicit, labeled AC1, AC2, AC3, … Each criterion must be independently verifiable
  • Constraints — technical or process constraints the implementation must respect
  • Non-goals — what is explicitly out of scope for this task
It may also include:
  • Repo guidance sources consulted during freeze
  • A verification plan
  • Assumptions resolved narrowly from the user request
Every criterion must carry a label (AC1, AC2, …). Downstream steps — evidence packing, verification, and fix guidance — all reference criteria by these labels. Unlabeled criteria cannot be tracked independently.

The no-production-code rule

The spec-freezer must not change production code. Freeze is a planning step only. Writing implementation code during freeze conflates what is promised with what is delivered, and makes it impossible to separate the spec from the implementation in later audit.
During spec freeze, the freezer may:
  • Read repo guidance files (AGENTS.md, CLAUDE.md, .claude/CLAUDE.md, .claude/rules/**/*.md)
  • Read the minimum relevant code needed to understand the task
  • Write or update .agent/tasks/<TASK_ID>/spec.md
During spec freeze, the freezer must not:
  • Change production code
  • Write evidence.md, evidence.json, verdict.json, or problems.md

Resolving assumptions narrowly

When the user request is ambiguous, assumptions must be resolved narrowly — choosing the interpretation that requires less implementation, not more. Each assumption must be listed explicitly in spec.md so the verifier can evaluate whether the implementation matched the stated scope.

Verification plan

The spec may include a concise verification plan: the commands, tools, or checks that a fresh verifier should run to independently confirm each criterion. This plan is advisory — the verifier makes its own judgment — but a well-written verification plan reduces the chance that a criterion is marked UNKNOWN due to ambiguous test coverage.

Evidence packing requirements

After implementation, the builder packs evidence against each acceptance criterion. The evidence must:
  • Judge each criterion independently with exactly one of PASS, FAIL, or UNKNOWN
  • For every PASS, cite concrete proof:
    • File paths
    • Commands run
    • Exit codes
    • Output summaries
    • Artifact paths under raw/
  • For every FAIL or UNKNOWN, explain the gap
The builder must not claim overall_status: PASS in evidence.json unless every acceptance criterion is PASS. A single FAIL or UNKNOWN criterion means the overall status is not PASS.
Evidence packing may run missing checks (re-executing tests, linters, or build commands). It must not keep changing production code.

Fresh verification requirements

After evidence is packed, a fresh verifier — a new session or subagent that did not participate in implementation — performs an independent pass.
The verifier must be a fresh session or fresh subagent. It must judge the current repository state and current rerun results, not the builder’s narrative or prior chat claims.
The verifier writes two files:
  • verdict.json — always written; contains per-criterion verdicts and overall_verdict
  • problems.md — written only when overall_verdict is not PASS; contains per-criterion fix guidance
The verifier must not:
  • Modify production code
  • Backfill or patch the evidence bundle to make it look complete

problems.md requirements

For each non-PASS criterion, problems.md must include:
  • Criterion id and text
  • Status
  • Why it is not proven
  • Minimal reproduction steps
  • Expected vs actual
  • Affected files
  • Smallest safe fix
  • Corrective hint in 1–3 sentences
This structure ensures the fixer has everything needed to make a minimal, targeted correction without re-reading the entire implementation history.

The fix loop

If the verdict is not PASS, a fresh fixer reads only spec.md, verdict.json, and problems.md. It:
  1. Reconfirms each listed problem in the codebase before editing
  2. Makes the smallest safe change set
  3. Avoids regressing already-passing criteria
  4. Regenerates evidence.md, evidence.json, and raw artifacts
  5. Does not write verdict.json or claim final sign-off
After the fix, a fresh verifier runs again. The loop repeats until overall_verdict is PASS.
The workflow is considered complete only when a fresh verifier — one that did not participate in the fix — returns overall_verdict: PASS. The fixer never self-certifies completion.

Build docs developers (and LLMs) love