.agent/tasks/<TASK_ID>/ — a frozen spec, build evidence, raw artifacts, and a fresh-session verdict. The workflow never claims completion unless every acceptance criterion is independently verified as PASS.
Quick Start
Initialize your first task and run the full proof loop in minutes
Installation
Install the skill for Codex, Claude Code, or both
Core Concepts
Understand the spec-freeze → build → evidence → verify → fix loop
CLI Reference
Full reference for the bundled
task_loop.py scriptHow it works
Initialize the task
Run
init to create .agent/tasks/<TASK_ID>/ with all required artifacts and install project-scoped subagent files for Codex and Claude Code.Freeze the spec
A spec-freezer subagent reads your task description and writes
spec.md with explicit acceptance criteria (AC1, AC2, …), constraints, and non-goals. No production code is touched.Build and pack evidence
A builder subagent implements the task against the frozen spec, then packs
evidence.md, evidence.json, and raw build artifacts into the task folder.Verify with a fresh session
A fresh verifier subagent independently reruns checks, judges each acceptance criterion, and writes
verdict.json. If any criterion fails, problems.md is written with actionable fix guidance.Key features
Role-separated subagents
Spec-freezer, builder, verifier, and fixer are distinct roles — preventing self-justification bias and making failures easy to localize.
Repo-local proof
All artifacts stay inside the repository under
.agent/tasks/<TASK_ID>/. Task state is fully auditable and resumable.Codex and Claude Code support
Installs project-scoped subagent files for both Codex (
.codex/agents/) and Claude Code (.claude/agents/).Acceptance criteria tracking
Every criterion is independently graded
PASS, FAIL, or UNKNOWN in evidence.json and verdict.json.