Spec Freeze

Spec freeze is the step where the task is written down as a precise, unambiguous contract before any implementation begins. The frozen spec is the single source of truth that every subsequent step — build, evidence, verify, fix — is measured against.

What `spec.md` must contain

A frozen spec.md must include at minimum:

Original task statement — preserved verbatim or summarized accurately from the user request
Acceptance criteria — explicit, labeled AC1, AC2, AC3, … Each criterion must be independently verifiable
Constraints — technical or process constraints the implementation must respect
Non-goals — what is explicitly out of scope for this task

It may also include:

Repo guidance sources consulted during freeze
A verification plan
Assumptions resolved narrowly from the user request

Every criterion must carry a label (AC1, AC2, …). Downstream steps — evidence packing, verification, and fix guidance — all reference criteria by these labels. Unlabeled criteria cannot be tracked independently.

The no-production-code rule

The spec-freezer must not change production code. Freeze is a planning step only. Writing implementation code during freeze conflates what is promised with what is delivered, and makes it impossible to separate the spec from the implementation in later audit.

During spec freeze, the freezer may:

Read repo guidance files (AGENTS.md, CLAUDE.md, .claude/CLAUDE.md, .claude/rules/**/*.md)
Read the minimum relevant code needed to understand the task
Write or update .agent/tasks/<TASK_ID>/spec.md

During spec freeze, the freezer must not:

Change production code
Write evidence.md, evidence.json, verdict.json, or problems.md

Resolving assumptions narrowly

When the user request is ambiguous, assumptions must be resolved narrowly — choosing the interpretation that requires less implementation, not more. Each assumption must be listed explicitly in spec.md so the verifier can evaluate whether the implementation matched the stated scope.

Verification plan

The spec may include a concise verification plan: the commands, tools, or checks that a fresh verifier should run to independently confirm each criterion. This plan is advisory — the verifier makes its own judgment — but a well-written verification plan reduces the chance that a criterion is marked UNKNOWN due to ambiguous test coverage.

Evidence packing requirements

After implementation, the builder packs evidence against each acceptance criterion. The evidence must:

Judge each criterion independently with exactly one of PASS, FAIL, or UNKNOWN
For every PASS, cite concrete proof:
- File paths
- Commands run
- Exit codes
- Output summaries
- Artifact paths under raw/
For every FAIL or UNKNOWN, explain the gap

The builder must not claim overall_status: PASS in evidence.json unless every acceptance criterion is PASS. A single FAIL or UNKNOWN criterion means the overall status is not PASS.

Evidence packing may run missing checks (re-executing tests, linters, or build commands). It must not keep changing production code.

Fresh verification requirements

After evidence is packed, a fresh verifier — a new session or subagent that did not participate in implementation — performs an independent pass.

The verifier must be a fresh session or fresh subagent. It must judge the current repository state and current rerun results, not the builder’s narrative or prior chat claims.

The verifier writes two files:

verdict.json — always written; contains per-criterion verdicts and overall_verdict
problems.md — written only when overall_verdict is not PASS; contains per-criterion fix guidance

The verifier must not:

Modify production code
Backfill or patch the evidence bundle to make it look complete

`problems.md` requirements

For each non-PASS criterion, problems.md must include:

Criterion id and text
Status
Why it is not proven
Minimal reproduction steps
Expected vs actual
Affected files
Smallest safe fix
Corrective hint in 1–3 sentences

This structure ensures the fixer has everything needed to make a minimal, targeted correction without re-reading the entire implementation history.

The fix loop

If the verdict is not PASS, a fresh fixer reads only spec.md, verdict.json, and problems.md. It:

Reconfirms each listed problem in the codebase before editing
Makes the smallest safe change set
Avoids regressing already-passing criteria
Regenerates evidence.md, evidence.json, and raw artifacts
Does not write verdict.json or claim final sign-off

After the fix, a fresh verifier runs again. The loop repeats until overall_verdict is PASS.

The workflow is considered complete only when a fresh verifier — one that did not participate in the fix — returns overall_verdict: PASS. The fixer never self-certifies completion.

Get Started

Core Concepts

Guides

What `spec.md` must contain

The no-production-code rule

Resolving assumptions narrowly

Verification plan

Evidence packing requirements

Fresh verification requirements

`problems.md` requirements

The fix loop

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​What spec.md must contain

​The no-production-code rule

​Resolving assumptions narrowly

​Verification plan

​Evidence packing requirements

​Fresh verification requirements

​problems.md requirements

​The fix loop

Build docs developers (and LLMs) love

What `spec.md` must contain

The no-production-code rule

Resolving assumptions narrowly

Verification plan

Evidence packing requirements

Fresh verification requirements

`problems.md` requirements

The fix loop