Skip to main content
The Repo Task Proof Loop enforces a strict ordering: freeze the spec first, implement second, verify in a fresh session third. No step can be skipped or batched with the one before it. The key principle is that a verifier who never touched the implementation cannot rationalize away failures. By separating roles — freezer, builder, verifier, fixer — the workflow makes failures easy to localize and audit trails easy to read.

The loop at a glance

init → freeze → build → evidence → verify → fix → verify again
Every task produces durable artifacts inside the repository under .agent/tasks/<TASK_ID>/. Those artifacts survive session restarts and can be resumed or audited at any point.

Full workflow

1

init

Run the bundled initializer from the repository root. This is a serial prerequisite — never overlap it with any other step or child-agent invocation.
python3 "$SKILL_PATH/scripts/task_loop.py" init --task-id <TASK_ID>
Optionally seed the task statement from a file or inline text:
python3 "$SKILL_PATH/scripts/task_loop.py" init --task-id <TASK_ID> --task-file docs/tasks/my-task.md
python3 "$SKILL_PATH/scripts/task_loop.py" init --task-id <TASK_ID> --task-text "Implement feature X"
init creates the full .agent/tasks/<TASK_ID>/ directory, all required artifact files (including placeholder JSON), and project-scoped subagent templates for both Codex and Claude Code. It also inserts a managed workflow block into AGENTS.md and the repo’s Claude guide file.After init, confirm that .agent/tasks/<TASK_ID>/spec.md exists before continuing.
init must always complete before freeze, build, evidence, verify, fix, or any child-agent work begins. The task folder must exist and the spec placeholder must be present, or downstream steps have nothing to read or validate against.
2

freeze

Freeze the task into .agent/tasks/<TASK_ID>/spec.md. The freezer reads repo guidance and the task source, produces explicit acceptance criteria labeled AC1, AC2, …, adds constraints and non-goals, and resolves ambiguity narrowly.The freezer must not change any production code. The spec is a contract — writing code during this step undermines the separation between what is promised and what is delivered.See Spec Freeze for the full requirements.
3

build

The builder reads spec.md and implements the task. It makes the smallest safe change set that satisfies the acceptance criteria, runs focused checks as needed, and keeps unrelated files untouched.The builder does not write verdict.json or problems.md, and does not claim final completion.
4

evidence

Still in the same builder session (by default), pack the evidence bundle. This means writing evidence.md, evidence.json, and the raw artifacts under raw/.Evidence packing may run missing checks but must not change production code. Every PASS must cite concrete proof — file paths, commands run, exit codes, output summaries, or artifact paths under raw/.See Task Artifacts for the full file shapes.
5

verify (fresh session)

Spawn a fresh verifier — a new session or subagent that has not participated in implementation. The verifier reads spec.md, evidence.md, and evidence.json, then independently inspects the current codebase and reruns verification.The verifier writes verdict.json. If the overall verdict is not PASS, it also writes problems.md with per-criterion fix guidance.The verifier must not modify production code or backfill the evidence bundle.See Fresh verification below.
6

fix (if needed)

If the verdict is not PASS, a fresh fixer reads spec.md, verdict.json, and problems.md. It reconfirms each listed problem before editing, makes the smallest safe change set, avoids regressing already-passing criteria, and regenerates the evidence bundle.The fixer does not write verdict.json or claim final sign-off.
7

verify again

After a fix, spawn another fresh verifier. Repeat the fix → verify cycle until the verifier returns PASS or the user stops the loop.

Heavy-task default workflow

For large tasks, prefer subagents when the platform supports them.

Preferred sequence

  1. Run init <TASK_ID> if needed. Wait for it to finish, then confirm .agent/tasks/<TASK_ID>/spec.md and the repo-local task structure exist before continuing.
  2. Only after init completes, spawn exactly one spec-freezer subagent and wait for it.
  3. Spawn exactly one builder subagent and let it implement.
  4. Continue with the same builder session for evidence packing.
  5. Spawn exactly one fresh verifier subagent and wait for it.
  6. If verdict is not PASS, spawn exactly one fresh fixer subagent.
  7. Spawn one fresh verifier subagent again.
  8. Repeat steps 6–7 until the verifier returns PASS or the user stops the loop.

Platform behavior

Explicitly ask for subagents. Do not assume they spawn automatically.Use explicit delegation language. The parent should ask Codex to spawn one named child, wait for it, and then continue. Keep delegation depth flat — one child per role at a time.Example delegation shape:
Spawn one `task-spec-freezer` agent for TASK_ID <TASK_ID>. Wait for it. Tell it to freeze the spec in .agent/tasks/<TASK_ID>/spec.md using the repo guidance and the task source.
Do not spawn any child until init <TASK_ID> has finished and .agent/tasks/<TASK_ID>/spec.md exists. Do not batch init with other commands or tool calls.
If subagents are unavailable on either platform, preserve the same role separation across separate sessions or clear mode changes in the current session.

Fresh verification

Fresh verification means the verifier is a new session or subagent that did not participate in implementation. It judges the current repository state and current rerun results — not the builder’s narrative or prior chat claims. This matters because an agent that implemented the code is motivated (even unconsciously) to interpret ambiguous output as passing. A verifier that has never seen the implementation process has no such motivation. It either proves the criterion against the current codebase or it does not. The verifier is the only role that writes verdict.json. It is also the only role that writes problems.md. Neither the builder nor the fixer may write these files.

Inferring the next step

If no explicit command is given, the next step is inferred from repo state:
ConditionNext step
Task folder does not existinit only — stop and wait
spec.md is missing or placeholder-onlyfreeze
Implementation is not yet completebuild
Evidence is stale or missingevidence
No fresh verdict existsverify
Verdict is not PASSfix

Validation and status

Before claiming the workflow is correctly initialized or the artifact set is complete:
python3 "$SKILL_PATH/scripts/task_loop.py" validate --task-id <TASK_ID>
For a quick summary of current artifact state:
python3 "$SKILL_PATH/scripts/task_loop.py" status --task-id <TASK_ID>

Build docs developers (and LLMs) love