Skip to main content
Every task managed by the Repo Task Proof Loop lives inside a single directory in the repository:
.agent/tasks/<TASK_ID>/
All artifacts stay inside the repository. They are created at init time (as placeholders where needed) so that the validation script can run immediately after initialization.
Placeholder files for evidence.json and verdict.json are written at init time with valid JSON structure and UNKNOWN status values. This means scripts/task_loop.py validate can run right after init without waiting for the builder or verifier to produce real content.

Directory structure

.agent/tasks/<TASK_ID>/
  spec.md
  evidence.md
  evidence.json
  raw/
    build.txt
    test-unit.txt
    test-integration.txt
    lint.txt
    screenshot-1.png
  verdict.json
  problems.md

File reference

spec.md

Created by: the spec-freezer during the freeze step (the init step creates a placeholder). Contains:
  • Original task statement
  • Explicit acceptance criteria labeled AC1, AC2, …
  • Constraints
  • Non-goals
  • Optionally: repo guidance sources, verification plan, assumptions resolved narrowly from the user request
The spec is the contract that every subsequent step is measured against. It must not be modified after freeze except to correct a genuine misunderstanding agreed upon with the user.

evidence.md

Created by: the builder during the evidence step. Contains: a human-readable summary of the evidence bundle — per-criterion status, proof citations, and a summary of commands run. This is the narrative companion to evidence.json.

evidence.json

Created by: the builder during the evidence step. Contains: machine-readable per-criterion judgments (PASS, FAIL, or UNKNOWN), concrete proof citations, changed files, and commands for a fresh verifier to rerun. Required top-level keys:
KeyDescription
task_idMust match the <TASK_ID> used at init
overall_statusPASS, FAIL, or UNKNOWN
acceptance_criteriaArray of per-criterion objects
changed_filesFiles modified during the build
commands_for_fresh_verifierCommands the verifier should rerun
known_gapsAny gaps in the evidence that cannot be resolved
Each entry in acceptance_criteria requires id, text, status, proof, and gaps.
{
  "task_id": "my-task",
  "overall_status": "UNKNOWN",
  "acceptance_criteria": [
    {
      "id": "AC1",
      "text": "Describe the criterion",
      "status": "UNKNOWN",
      "proof": [
        {
          "type": "command",
          "path": ".agent/tasks/my-task/raw/test-unit.txt",
          "command": "npm test -- --runInBand",
          "exit_code": 0,
          "summary": "Targeted unit tests passed."
        }
      ],
      "gaps": []
    }
  ],
  "changed_files": [],
  "commands_for_fresh_verifier": [],
  "known_gaps": []
}
overall_status must be PASS only if every acceptance criterion in acceptance_criteria is also PASS. Do not claim overall PASS in the evidence bundle if any criterion is FAIL or UNKNOWN.

verdict.json

Created by: the verifier during the verify step. Only the verifier writes this file. Contains: the fresh verifier’s independent judgment of the current repository state — per-criterion verdicts, commands the verifier ran, and artifacts it used. Required top-level keys:
KeyDescription
task_idMust match the <TASK_ID>
overall_verdictPASS, FAIL, or UNKNOWN
criteriaArray of per-criterion verdict objects
commands_runCommands the verifier actually ran
artifacts_usedArtifact files the verifier read
Each entry in criteria requires id, status, and reason.
{
  "task_id": "my-task",
  "overall_verdict": "UNKNOWN",
  "criteria": [
    {
      "id": "AC1",
      "status": "UNKNOWN",
      "reason": "Not yet verified."
    }
  ],
  "commands_run": [],
  "artifacts_used": []
}

problems.md

Created by: the verifier, only when overall_verdict is not PASS. Contains: per-criterion fix guidance for every non-PASS criterion. The fixer reads this file (along with spec.md and verdict.json) and must not act without it. For each non-PASS criterion, problems.md must include:
  • Criterion id and text
  • Status
  • Why it is not proven
  • Minimal reproduction steps
  • Expected vs actual
  • Affected files
  • Smallest safe fix
  • Corrective hint in 1–3 sentences

raw/

Created by: the builder during the evidence step (placeholders created by init). The raw/ subdirectory holds direct command output and screenshots — the concrete proof that evidence.json citations point to.
FileContents
build.txtBuild command output
test-unit.txtUnit test run output
test-integration.txtIntegration test run output
lint.txtLint run output
screenshot-1.pngScreenshot when a visual check is useful
raw/screenshot-1.png is created at init time as a tiny valid placeholder PNG so the required path exists from the start. The builder replaces it with a real screenshot when relevant.

Validation

The validation script checks required file presence, JSON parseability, required top-level key presence, allowed status values, and task ID consistency across files:
python3 "$SKILL_PATH/scripts/task_loop.py" validate --task-id <TASK_ID>
For a quick status summary:
python3 "$SKILL_PATH/scripts/task_loop.py" status --task-id <TASK_ID>

Build docs developers (and LLMs) love