The Proof Loop Workflow

The Repo Task Proof Loop enforces a strict ordering: freeze the spec first, implement second, verify in a fresh session third. No step can be skipped or batched with the one before it. The key principle is that a verifier who never touched the implementation cannot rationalize away failures. By separating roles — freezer, builder, verifier, fixer — the workflow makes failures easy to localize and audit trails easy to read.

The loop at a glance

init → freeze → build → evidence → verify → fix → verify again

Every task produces durable artifacts inside the repository under .agent/tasks/<TASK_ID>/. Those artifacts survive session restarts and can be resumed or audited at any point.

Full workflow

init

Run the bundled initializer from the repository root. This is a serial prerequisite — never overlap it with any other step or child-agent invocation.

python3 "$SKILL_PATH/scripts/task_loop.py" init --task-id <TASK_ID>

Optionally seed the task statement from a file or inline text:

python3 "$SKILL_PATH/scripts/task_loop.py" init --task-id <TASK_ID> --task-file docs/tasks/my-task.md
python3 "$SKILL_PATH/scripts/task_loop.py" init --task-id <TASK_ID> --task-text "Implement feature X"

init creates the full .agent/tasks/<TASK_ID>/ directory, all required artifact files (including placeholder JSON), and project-scoped subagent templates for both Codex and Claude Code. It also inserts a managed workflow block into AGENTS.md and the repo’s Claude guide file.After init, confirm that .agent/tasks/<TASK_ID>/spec.md exists before continuing.

init must always complete before freeze, build, evidence, verify, fix, or any child-agent work begins. The task folder must exist and the spec placeholder must be present, or downstream steps have nothing to read or validate against.

freeze

Freeze the task into .agent/tasks/<TASK_ID>/spec.md. The freezer reads repo guidance and the task source, produces explicit acceptance criteria labeled AC1, AC2, …, adds constraints and non-goals, and resolves ambiguity narrowly.The freezer must not change any production code. The spec is a contract — writing code during this step undermines the separation between what is promised and what is delivered.See Spec Freeze for the full requirements.

build

The builder reads spec.md and implements the task. It makes the smallest safe change set that satisfies the acceptance criteria, runs focused checks as needed, and keeps unrelated files untouched.The builder does not write verdict.json or problems.md, and does not claim final completion.

evidence

Still in the same builder session (by default), pack the evidence bundle. This means writing evidence.md, evidence.json, and the raw artifacts under raw/.Evidence packing may run missing checks but must not change production code. Every PASS must cite concrete proof — file paths, commands run, exit codes, output summaries, or artifact paths under raw/.See Task Artifacts for the full file shapes.

verify (fresh session)

Spawn a fresh verifier — a new session or subagent that has not participated in implementation. The verifier reads spec.md, evidence.md, and evidence.json, then independently inspects the current codebase and reruns verification.The verifier writes verdict.json. If the overall verdict is not PASS, it also writes problems.md with per-criterion fix guidance.The verifier must not modify production code or backfill the evidence bundle.See Fresh verification below.

fix (if needed)

If the verdict is not PASS, a fresh fixer reads spec.md, verdict.json, and problems.md. It reconfirms each listed problem before editing, makes the smallest safe change set, avoids regressing already-passing criteria, and regenerates the evidence bundle.The fixer does not write verdict.json or claim final sign-off.

verify again

After a fix, spawn another fresh verifier. Repeat the fix → verify cycle until the verifier returns PASS or the user stops the loop.

Heavy-task default workflow

For large tasks, prefer subagents when the platform supports them.

Preferred sequence

Run init <TASK_ID> if needed. Wait for it to finish, then confirm .agent/tasks/<TASK_ID>/spec.md and the repo-local task structure exist before continuing.
Only after init completes, spawn exactly one spec-freezer subagent and wait for it.
Spawn exactly one builder subagent and let it implement.
Continue with the same builder session for evidence packing.
Spawn exactly one fresh verifier subagent and wait for it.
If verdict is not PASS, spawn exactly one fresh fixer subagent.
Spawn one fresh verifier subagent again.
Repeat steps 6–7 until the verifier returns PASS or the user stops the loop.

Platform behavior

Codex
Claude Code

Explicitly ask for subagents. Do not assume they spawn automatically.Use explicit delegation language. The parent should ask Codex to spawn one named child, wait for it, and then continue. Keep delegation depth flat — one child per role at a time.Example delegation shape:

Spawn one `task-spec-freezer` agent for TASK_ID <TASK_ID>. Wait for it. Tell it to freeze the spec in .agent/tasks/<TASK_ID>/spec.md using the repo guidance and the task source.

Do not spawn any child until init <TASK_ID> has finished and .agent/tasks/<TASK_ID>/spec.md exists. Do not batch init with other commands or tool calls.

Prefer the installed project subagents from .claude/agents/. Use /agents to inspect the available agents.If init just created or refreshed .claude/agents/* during a running Claude Code session, start a new session before expecting those updated agents to appear.Reuse the same builder child for the evidence step by default. Only run a fresh builder in evidence-only mode if the original builder session is unavailable or you intentionally discarded it.Keep the orchestration flat: the parent session should select each role directly instead of asking one custom task agent to spawn another.Example delegation shape:

Use the `task-verifier` agent for TASK_ID <TASK_ID>. It must be a fresh verifier pass against the current codebase and must write verdict.json and, if needed, problems.md.

Use claude --agent <name> only when you intentionally want a direct single-agent session instead of the parent-orchestrated proof loop.

If subagents are unavailable on either platform, preserve the same role separation across separate sessions or clear mode changes in the current session.

Fresh verification

Fresh verification means the verifier is a new session or subagent that did not participate in implementation. It judges the current repository state and current rerun results — not the builder’s narrative or prior chat claims. This matters because an agent that implemented the code is motivated (even unconsciously) to interpret ambiguous output as passing. A verifier that has never seen the implementation process has no such motivation. It either proves the criterion against the current codebase or it does not. The verifier is the only role that writes verdict.json. It is also the only role that writes problems.md. Neither the builder nor the fixer may write these files.

Inferring the next step

If no explicit command is given, the next step is inferred from repo state:

Condition	Next step
Task folder does not exist	`init` only — stop and wait
`spec.md` is missing or placeholder-only	`freeze`
Implementation is not yet complete	`build`
Evidence is stale or missing	`evidence`
No fresh verdict exists	`verify`
Verdict is not `PASS`	`fix`

Validation and status

Before claiming the workflow is correctly initialized or the artifact set is complete:

python3 "$SKILL_PATH/scripts/task_loop.py" validate --task-id <TASK_ID>

For a quick summary of current artifact state:

python3 "$SKILL_PATH/scripts/task_loop.py" status --task-id <TASK_ID>

Get Started

Core Concepts

Guides

The loop at a glance

Full workflow

Heavy-task default workflow

Preferred sequence

Platform behavior

Fresh verification

Inferring the next step

Validation and status

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​The loop at a glance

​Full workflow

​Heavy-task default workflow

​Preferred sequence

​Platform behavior

​Fresh verification

​Inferring the next step

​Validation and status

Build docs developers (and LLMs) love

The loop at a glance

Full workflow

Heavy-task default workflow

Preferred sequence

Platform behavior

Fresh verification

Inferring the next step

Validation and status