Skip to main content
Every task managed by the workflow stores its artifacts inside the repository under .agent/tasks/<TASK_ID>/. This page documents the required file structure, the JSON schemas for evidence.json and verdict.json, and the problems.md format.

File structure

.agent/tasks/TASK_ID/
  spec.md
  evidence.md
  evidence.json
  raw/
    build.txt
    test-unit.txt
    test-integration.txt
    lint.txt
    screenshot-1.png
  verdict.json
  problems.md
All files are created as placeholders by init and are progressively populated by freeze, build/evidence, verify, and fix.

evidence.json schema

evidence.json is written by the builder (or evidence-only subagent) and read by the fresh verifier.

Required top-level keys

task_id
string
required
Must match the <TASK_ID> used when init was run. The validation script checks this for consistency.
overall_status
string
required
Aggregate status across all acceptance criteria. Allowed values: PASS, FAIL, UNKNOWN. Must be PASS only if every AC entry is PASS.
acceptance_criteria
array
required
Array of per-AC objects. Each object must include id, text, status, proof, and gaps.
changed_files
array
required
List of file paths modified during the build phase.
commands_for_fresh_verifier
array
required
Commands the verifier should rerun independently to reproduce evidence.
known_gaps
array
required
Any evidence gaps that apply at the overall level, not tied to a single AC.

Allowed status values

ValueMeaning
PASSCriterion proven with concrete citations
FAILCriterion contradicted or incomplete
UNKNOWNCriterion cannot be verified locally

Complete example

{
  "task_id": "my-task",
  "overall_status": "UNKNOWN",
  "acceptance_criteria": [
    {
      "id": "AC1",
      "text": "Describe the criterion",
      "status": "UNKNOWN",
      "proof": [
        {
          "type": "command",
          "path": ".agent/tasks/my-task/raw/test-unit.txt",
          "command": "npm test -- --runInBand",
          "exit_code": 0,
          "summary": "Targeted unit tests passed."
        }
      ],
      "gaps": []
    }
  ],
  "changed_files": [],
  "commands_for_fresh_verifier": [],
  "known_gaps": []
}

verdict.json schema

verdict.json is written by the fresh verifier and read by the fixer (if needed).

Required top-level keys

task_id
string
required
Must match the <TASK_ID>. The validation script checks this for consistency.
overall_verdict
string
required
Aggregate verdict. Allowed values: PASS, FAIL, UNKNOWN. Must be PASS only if every criterion entry is PASS.
criteria
array
required
Array of per-AC verdict objects. Each must include id, status, and reason.
commands_run
array
required
Commands the verifier ran during independent verification.
artifacts_used
array
required
Evidence bundle artifacts the verifier read.

Allowed status values

ValueMeaning
PASSCriterion proven in the current codebase
FAILCriterion contradicted, broken, or incomplete
UNKNOWNCriterion cannot be verified locally

Complete example

{
  "task_id": "my-task",
  "overall_verdict": "UNKNOWN",
  "criteria": [
    {
      "id": "AC1",
      "status": "UNKNOWN",
      "reason": "Not yet verified."
    }
  ],
  "commands_run": [],
  "artifacts_used": []
}

problems.md format

problems.md is written by the verifier when the overall verdict is not PASS. For every non-PASS criterion, the file must contain a dedicated section with all of the following:
SectionDescription
Criterion id and textThe AC1/AC2/… label and full criterion text from spec.md
StatusFAIL or UNKNOWN
Why it is not provenWhat evidence is missing, contradicted, or unverifiable
Minimal reproduction stepsThe smallest sequence of commands or actions that reproduce the failure
Expected vs actualExpected behavior or state versus what was observed
Affected filesFile paths relevant to the failure
Smallest safe fixMinimal change that would satisfy this criterion without regressing others
Corrective hint1–3 sentences guiding the fixer toward the correct solution

Validation script

Run the bundled validation script from inside the repository to check all task artifacts at once:
python3 "$SKILL_PATH/scripts/task_loop.py" validate --task-id <TASK_ID>
The script checks:
  • Required file presence — all files under .agent/tasks/<TASK_ID>/ must exist
  • JSON parseability — evidence.json and verdict.json must be valid JSON
  • Top-level key presence — all required keys must be present in each JSON file
  • Allowed status values — overall_status, overall_verdict, and per-criterion statuses must be PASS, FAIL, or UNKNOWN
  • Task ID consistency — task_id inside each JSON file must match the --task-id argument
The script exits with code 0 when all checks pass and code 1 when any check fails.
Run python3 "$SKILL_PATH/scripts/task_loop.py" status --task-id <TASK_ID> for a quick human-readable summary of current artifact state without strict validation.

Build docs developers (and LLMs) love