Skip to main content
The workflow installs four narrow, role-specific subagent templates into the repository. Each role has a defined purpose, a fixed set of files it may read and write, and hard boundaries it must not cross. Keeping roles separate prevents self-justification, makes failures easier to localize, and ensures that implementation, judgment, and correction remain distinct concerns.

Installed files

.codex/agents/task-spec-freezer.toml
.codex/agents/task-builder.toml
.codex/agents/task-verifier.toml
.codex/agents/task-fixer.toml
The agent files are intentionally narrow and role-specific. They are created by init and may be refreshed by re-running init.
In Claude Code, if init just created or refreshed .claude/agents/* during a running session, start a new Claude Code session before relying on the updated agents. Use /agents to inspect what is currently available.

Role definitions

Purpose: Freeze the task into .agent/tasks/<TASK_ID>/spec.md.Hard boundaries:
  • May read repo guidance (AGENTS.md, CLAUDE.md, .claude/CLAUDE.md, .claude/rules/**/*.md) and the minimum relevant code needed to clarify the spec
  • Must not change production code
  • Must not write verdict.json, problems.md, evidence.md, or evidence.json
Purpose: Implement the task and later pack evidence.Modes: BUILD and EVIDENCEHard boundaries:
  • In BUILD mode: implement against the spec; must not write verdict.json or problems.md
  • In EVIDENCE mode: do not change production code; pack artifacts only
Purpose: Fresh-session verification against the current codebase.Hard boundaries:
  • Must not edit production code
  • Must not patch the evidence bundle to make it look complete
  • Must write verdict.json
  • Must write problems.md only when the overall verdict is not PASS
Purpose: Repair only what the verifier identified.Hard boundaries:
  • Must reread spec.md, verdict.json, and problems.md before editing
  • Must reconfirm each listed problem in the live codebase before applying a fix
  • Must regenerate evidence.md, evidence.json, and raw artifacts after the fix
  • Must not write verdict.json or claim final sign-off

Invocation patterns

Use explicit delegation language. The parent asks Codex to spawn exactly one named child, waits for it to complete, then continues to the next step. Never spawn a child before init <TASK_ID> has finished and .agent/tasks/<TASK_ID>/spec.md exists.Suggested delegation shape:
Spawn one `task-spec-freezer` agent for TASK_ID <TASK_ID>. Wait for it. Tell it to freeze the spec in .agent/tasks/<TASK_ID>/spec.md using the repo guidance and the task source.
Repeat the same pattern for task-builder, task-verifier, and task-fixer.
In Codex, explicitly ask for subagents. Do not assume they spawn automatically.

Same-session evidence packing

The preferred pattern for the build and evidence phases is:
1

Spawn task-builder

The parent spawns exactly one task-builder subagent and sends the BUILD prompt.
2

Let it implement

The builder implements the task against the frozen spec and returns files changed, checks run, and open risks.
3

Continue with EVIDENCE in the same session

Without spawning a new child, send the EVIDENCE follow-up prompt to the same builder session. The builder then packs evidence using the command outputs already available in its context.
In Claude Code, the same-session follow-up is the default path. Only run a second task-builder child with an explicit EVIDENCE-ONLY prompt if the original builder session is unavailable or you intentionally discarded it.

Why roles stay separate

The workflow is designed to keep three concerns apart:

Implementation

The builder makes changes to production code. It is the only role permitted to do so.

Judgment

The verifier independently assesses whether the implementation satisfies the spec. A fresh session prevents it from being influenced by the builder’s narrative.

Correction

The fixer applies targeted repairs identified by the verifier. Reading only the verifier’s output keeps the fix scope minimal.
Separating these concerns reduces self-justification (a builder cannot approve its own work) and makes failures easier to localize (a failing verdict points to the implementation or spec, not to a blended session that did both).

Flat delegation principle

The parent orchestrator should select each role directly, one at a time:
  • Spawn spec-freezer → wait → spawn builder → wait → (evidence in same session) → spawn verifier → wait → (if not PASS) spawn fixer → wait → spawn verifier again → repeat
Do not ask one custom task agent to spawn another. Keep delegation depth flat. This makes the workflow easier to observe, interrupt, and resume.

Build docs developers (and LLMs) love