Skip to main content
The init step installs project-scoped subagent templates for both Codex and Claude Code. Each template is intentionally narrow and role-specific. The roles stay separate to keep implementation, judgment, and correction in distinct hands — this reduces self-justification and makes failures easier to localize.

Installed files

.codex/agents/task-spec-freezer.toml
.codex/agents/task-builder.toml
.codex/agents/task-verifier.toml
.codex/agents/task-fixer.toml
In Claude Code, if init just created or refreshed .claude/agents/* during a running session, start a new Claude Code session before relying on those updated agents. Use /agents to confirm the available agent list.

Role definitions

task-spec-freezer

Purpose: Freeze the task into .agent/tasks/<TASK_ID>/spec.md. Hard boundaries:
  • May read repo guidance and relevant code
  • Must not change production code
  • Must not write verdict or problems files

task-builder

Purpose: Implement the task and later pack evidence. Modes: BUILD and EVIDENCE Hard boundaries:
  • In BUILD mode: implement against the spec, make the smallest safe change set, keep unrelated files untouched, do not write verdict.json or problems.md
  • In EVIDENCE mode: do not change production code; write evidence.md, evidence.json, and raw artifacts

task-verifier

Purpose: Fresh-session verification against the current codebase. Hard boundaries:
  • Must not edit production code
  • Must not patch the evidence bundle to make it look complete
  • Must write verdict.json
  • Must write problems.md only when the verdict is not PASS

task-fixer

Purpose: Repair only what the verifier identified. Hard boundaries:
  • Must reread the spec and verifier output
  • Must reconfirm the problem before editing
  • Must regenerate evidence after the fix
  • Must not write final sign-off

Invocation patterns

Use explicit delegation language. The parent asks Codex to spawn one named child, waits for it, and then continues. Do not spawn any child until init <TASK_ID> has finished and .agent/tasks/<TASK_ID>/spec.md exists. Do not batch init with other commands or tool calls.Keep delegation depth flat — one child per role at a time.Spec freezer:
Spawn one `task-spec-freezer` agent for TASK_ID <TASK_ID>. Wait for it. Tell it to freeze the spec in .agent/tasks/<TASK_ID>/spec.md using the repo guidance and the task source.
Builder:
Spawn one `task-builder` agent for TASK_ID <TASK_ID>. Wait for it. Tell it to implement the task against the frozen spec.
Verifier:
Spawn one `task-verifier` agent for TASK_ID <TASK_ID>. Wait for it. It must be a fresh verifier pass against the current codebase and must write verdict.json and, if needed, problems.md.
Fixer:
Spawn one `task-fixer` agent for TASK_ID <TASK_ID>. Wait for it. It must reconfirm each listed problem before editing and must regenerate the evidence bundle after fixing.
Repeat the same delegation pattern for each role.

Same-session evidence packing

The preferred pattern is:
  1. Spawn task-builder
  2. Let it implement (BUILD mode)
  3. Continue with the same child in EVIDENCE mode
In Claude Code, this same-session follow-up is the default path. Only run a second task-builder child with an explicit EVIDENCE-ONLY prompt if the original builder session is unavailable or you intentionally want a fresh evidence-only run. Follow-up prompt to the same builder session:
PACK EVIDENCE for TASK_ID <TASK_ID>.

Do not change production code.

Read:
- .agent/tasks/<TASK_ID>/spec.md
- the current repository state
- any prior command results from this builder session

Write or update:
- .agent/tasks/<TASK_ID>/evidence.md
- .agent/tasks/<TASK_ID>/evidence.json
- .agent/tasks/<TASK_ID>/raw/build.txt
- .agent/tasks/<TASK_ID>/raw/test-unit.txt
- .agent/tasks/<TASK_ID>/raw/test-integration.txt
- .agent/tasks/<TASK_ID>/raw/lint.txt
- .agent/tasks/<TASK_ID>/raw/screenshot-1.png when a screenshot is useful
Fallback prompt when the original builder session is unavailable:
You are in EVIDENCE-ONLY mode for TASK_ID <TASK_ID>.

Read:
- .agent/tasks/<TASK_ID>/spec.md
- the current repository state

Write the same evidence bundle as above.

Do not change production code.

Why the roles stay separate

The workflow keeps three concerns apart:
  • Implementation — building the code
  • Judgment — evaluating whether the code meets the spec
  • Correction — fixing only what the judgment found wanting
Mixing these in a single session creates self-justification risk: an agent that wrote the code is motivated to read ambiguous output as passing. A fresh verifier that never touched the implementation has no such motivation. A fixer that reads only the verifier’s output — not the builder’s narrative — is constrained to the minimum necessary change.

Build docs developers (and LLMs) love