.agent/tasks/<TASK_ID>/. This page documents the required file structure, the JSON schemas for evidence.json and verdict.json, and the problems.md format.
File structure
init and are progressively populated by freeze, build/evidence, verify, and fix.
evidence.json schema
evidence.json is written by the builder (or evidence-only subagent) and read by the fresh verifier.
Required top-level keys
Must match the
<TASK_ID> used when init was run. The validation script checks this for consistency.Aggregate status across all acceptance criteria. Allowed values:
PASS, FAIL, UNKNOWN. Must be PASS only if every AC entry is PASS.Array of per-AC objects. Each object must include
id, text, status, proof, and gaps.List of file paths modified during the build phase.
Commands the verifier should rerun independently to reproduce evidence.
Any evidence gaps that apply at the overall level, not tied to a single AC.
Allowed status values
| Value | Meaning |
|---|---|
PASS | Criterion proven with concrete citations |
FAIL | Criterion contradicted or incomplete |
UNKNOWN | Criterion cannot be verified locally |
Complete example
verdict.json schema
verdict.json is written by the fresh verifier and read by the fixer (if needed).
Required top-level keys
Must match the
<TASK_ID>. The validation script checks this for consistency.Aggregate verdict. Allowed values:
PASS, FAIL, UNKNOWN. Must be PASS only if every criterion entry is PASS.Array of per-AC verdict objects. Each must include
id, status, and reason.Commands the verifier ran during independent verification.
Evidence bundle artifacts the verifier read.
Allowed status values
| Value | Meaning |
|---|---|
PASS | Criterion proven in the current codebase |
FAIL | Criterion contradicted, broken, or incomplete |
UNKNOWN | Criterion cannot be verified locally |
Complete example
problems.md format
problems.md is written by the verifier when the overall verdict is not PASS. For every non-PASS criterion, the file must contain a dedicated section with all of the following:
| Section | Description |
|---|---|
| Criterion id and text | The AC1/AC2/… label and full criterion text from spec.md |
| Status | FAIL or UNKNOWN |
| Why it is not proven | What evidence is missing, contradicted, or unverifiable |
| Minimal reproduction steps | The smallest sequence of commands or actions that reproduce the failure |
| Expected vs actual | Expected behavior or state versus what was observed |
| Affected files | File paths relevant to the failure |
| Smallest safe fix | Minimal change that would satisfy this criterion without regressing others |
| Corrective hint | 1–3 sentences guiding the fixer toward the correct solution |
Validation script
Run the bundled validation script from inside the repository to check all task artifacts at once:- Required file presence — all files under
.agent/tasks/<TASK_ID>/must exist - JSON parseability —
evidence.jsonandverdict.jsonmust be valid JSON - Top-level key presence — all required keys must be present in each JSON file
- Allowed status values —
overall_status,overall_verdict, and per-criterion statuses must bePASS,FAIL, orUNKNOWN - Task ID consistency —
task_idinside each JSON file must match the--task-idargument
0 when all checks pass and code 1 when any check fails.