Every task managed by the Repo Task Proof Loop lives inside a single directory in the repository:
All artifacts stay inside the repository. They are created at init time (as placeholders where needed) so that the validation script can run immediately after initialization.
Placeholder files for evidence.json and verdict.json are written at init time with valid JSON structure and UNKNOWN status values. This means scripts/task_loop.py validate can run right after init without waiting for the builder or verifier to produce real content.
Directory structure
.agent/tasks/<TASK_ID>/
spec.md
evidence.md
evidence.json
raw/
build.txt
test-unit.txt
test-integration.txt
lint.txt
screenshot-1.png
verdict.json
problems.md
File reference
spec.md
Created by: the spec-freezer during the freeze step (the init step creates a placeholder).
Contains:
- Original task statement
- Explicit acceptance criteria labeled
AC1, AC2, …
- Constraints
- Non-goals
- Optionally: repo guidance sources, verification plan, assumptions resolved narrowly from the user request
The spec is the contract that every subsequent step is measured against. It must not be modified after freeze except to correct a genuine misunderstanding agreed upon with the user.
evidence.md
Created by: the builder during the evidence step.
Contains: a human-readable summary of the evidence bundle — per-criterion status, proof citations, and a summary of commands run. This is the narrative companion to evidence.json.
evidence.json
Created by: the builder during the evidence step.
Contains: machine-readable per-criterion judgments (PASS, FAIL, or UNKNOWN), concrete proof citations, changed files, and commands for a fresh verifier to rerun.
Required top-level keys:
| Key | Description |
|---|
task_id | Must match the <TASK_ID> used at init |
overall_status | PASS, FAIL, or UNKNOWN |
acceptance_criteria | Array of per-criterion objects |
changed_files | Files modified during the build |
commands_for_fresh_verifier | Commands the verifier should rerun |
known_gaps | Any gaps in the evidence that cannot be resolved |
Each entry in acceptance_criteria requires id, text, status, proof, and gaps.
{
"task_id": "my-task",
"overall_status": "UNKNOWN",
"acceptance_criteria": [
{
"id": "AC1",
"text": "Describe the criterion",
"status": "UNKNOWN",
"proof": [
{
"type": "command",
"path": ".agent/tasks/my-task/raw/test-unit.txt",
"command": "npm test -- --runInBand",
"exit_code": 0,
"summary": "Targeted unit tests passed."
}
],
"gaps": []
}
],
"changed_files": [],
"commands_for_fresh_verifier": [],
"known_gaps": []
}
overall_status must be PASS only if every acceptance criterion in acceptance_criteria is also PASS. Do not claim overall PASS in the evidence bundle if any criterion is FAIL or UNKNOWN.
verdict.json
Created by: the verifier during the verify step. Only the verifier writes this file.
Contains: the fresh verifier’s independent judgment of the current repository state — per-criterion verdicts, commands the verifier ran, and artifacts it used.
Required top-level keys:
| Key | Description |
|---|
task_id | Must match the <TASK_ID> |
overall_verdict | PASS, FAIL, or UNKNOWN |
criteria | Array of per-criterion verdict objects |
commands_run | Commands the verifier actually ran |
artifacts_used | Artifact files the verifier read |
Each entry in criteria requires id, status, and reason.
{
"task_id": "my-task",
"overall_verdict": "UNKNOWN",
"criteria": [
{
"id": "AC1",
"status": "UNKNOWN",
"reason": "Not yet verified."
}
],
"commands_run": [],
"artifacts_used": []
}
problems.md
Created by: the verifier, only when overall_verdict is not PASS.
Contains: per-criterion fix guidance for every non-PASS criterion. The fixer reads this file (along with spec.md and verdict.json) and must not act without it.
For each non-PASS criterion, problems.md must include:
- Criterion id and text
- Status
- Why it is not proven
- Minimal reproduction steps
- Expected vs actual
- Affected files
- Smallest safe fix
- Corrective hint in 1–3 sentences
raw/
Created by: the builder during the evidence step (placeholders created by init).
The raw/ subdirectory holds direct command output and screenshots — the concrete proof that evidence.json citations point to.
| File | Contents |
|---|
build.txt | Build command output |
test-unit.txt | Unit test run output |
test-integration.txt | Integration test run output |
lint.txt | Lint run output |
screenshot-1.png | Screenshot when a visual check is useful |
raw/screenshot-1.png is created at init time as a tiny valid placeholder PNG so the required path exists from the start. The builder replaces it with a real screenshot when relevant.
Validation
The validation script checks required file presence, JSON parseability, required top-level key presence, allowed status values, and task ID consistency across files:
python3 "$SKILL_PATH/scripts/task_loop.py" validate --task-id <TASK_ID>
For a quick status summary:
python3 "$SKILL_PATH/scripts/task_loop.py" status --task-id <TASK_ID>