evidence

The evidence command directs the builder subagent (or an evidence-only subagent) to collect and record proof for each acceptance criterion. The resulting evidence bundle is what the fresh verifier reads before independently re-running checks.

Evidence packing must not change production code. It may run missing checks, but it must not modify any implementation files.

The evidence bundle

The following files make up the complete evidence bundle for a task:

.agent/tasks/TASK_ID/
  evidence.md
  evidence.json
  raw/
    build.txt
    test-unit.txt
    test-integration.txt
    lint.txt
    screenshot-1.png

evidence.md

Human-readable summary of per-AC status with citations. Prefer raw artifact paths and command output over narrative prose.

evidence.json

Machine-readable structured record of per-AC status. See the Artifact Schemas page for the full schema.

raw/build.txt

Full captured output of the build command.

raw/test-unit.txt

Full captured output of the unit test run.

raw/test-integration.txt

Full captured output of the integration test run.

raw/lint.txt

Full captured output of the linter run.

raw/screenshot-1.png

Screenshot artifact, written when a visual proof is useful.

Per-AC status rules

For every acceptance criterion in spec.md, the evidence packer must assign one of three statuses:

Status	Meaning
`PASS`	The criterion is proven in the current codebase. Concrete proof must be cited.
`FAIL`	The criterion is contradicted, broken, or incomplete. The gap must be explained.
`UNKNOWN`	The criterion cannot be verified locally. The gap must be explained.

Overall PASS is only valid when every individual AC is PASS. If even one AC is FAIL or UNKNOWN, the overall status must reflect that.

Every PASS must cite concrete proof, such as:

File paths confirming the change
Commands run with exit codes
Output summaries
Artifact paths under raw/

What to return

The evidence packer returns only:

overall_status — the aggregate across all ACs
Created or updated files — the full list of evidence bundle files written
Commands a fresh verifier should rerun — so the verifier can independently reproduce results

EVIDENCE follow-up prompt

This is the default path. Send this as a follow-up to the same builder session after BUILD completes.

PACK EVIDENCE for TASK_ID <TASK_ID>.

Do not change production code.

Read:
- .agent/tasks/<TASK_ID>/spec.md
- the current repository state
- any prior command results from this builder session

Write or update:
- .agent/tasks/<TASK_ID>/evidence.md
- .agent/tasks/<TASK_ID>/evidence.json
- .agent/tasks/<TASK_ID>/raw/build.txt
- .agent/tasks/<TASK_ID>/raw/test-unit.txt
- .agent/tasks/<TASK_ID>/raw/test-integration.txt
- .agent/tasks/<TASK_ID>/raw/lint.txt
- .agent/tasks/<TASK_ID>/raw/screenshot-1.png when a screenshot is useful

Rules:
- For each AC, assign PASS, FAIL, or UNKNOWN
- Every PASS must cite concrete proof
- FAIL and UNKNOWN must explain the gap
- Overall PASS only if every AC is PASS
- Prefer raw artifacts over narrative prose

Return only:
- overall_status
- created or updated files
- commands a fresh verifier should rerun

EVIDENCE-ONLY fallback prompt

Use this prompt only when the original builder session is unavailable or you intentionally want a fresh evidence-only run.

You are in EVIDENCE-ONLY mode for TASK_ID <TASK_ID>.

Read:
- .agent/tasks/<TASK_ID>/spec.md
- the current repository state

Write the same evidence bundle as above.

Do not change production code.

CLI Reference

Workflow Commands

Schemas & Templates

The evidence bundle

Per-AC status rules

What to return

EVIDENCE follow-up prompt

EVIDENCE-ONLY fallback prompt

Build docs developers (and LLMs) love

CLI Reference

Workflow Commands

Schemas & Templates

​The evidence bundle

​Per-AC status rules

​What to return

​EVIDENCE follow-up prompt

​EVIDENCE-ONLY fallback prompt

Build docs developers (and LLMs) love

The evidence bundle

Per-AC status rules

What to return

EVIDENCE follow-up prompt

EVIDENCE-ONLY fallback prompt