This guide walks you through the full Claimcheck workflow: authoring a mapping file that links natural-language requirements to Dafny lemmas, running the audit, understanding what each result status means, and acting on disputed claims based on the specific type of discrepancy detected. It also covers advanced options — adding Dafny verification, using single-prompt mode for richer analysis, and integrating Claimcheck into CI pipelines and programmatic toolchains.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/metareflection/claimcheck/llms.txt
Use this file to discover all available pages before exploring further.
The workflow cycle
Claimcheck is designed to slot into an iterative development loop. You write or revise lemmas, record which requirement each lemma covers, run the audit, fix what fails, and repeat until all claims are confirmed.Write or update Dafny lemmas
Author lemmas in a
.dfy file that formalize your natural-language requirements. Each lemma should express exactly one requirement so that the mapping stays unambiguous.Create the mapping file
Write a JSON file that pairs each requirement string with the name of the lemma that covers it. The array order does not matter.See Structuring the mapping file for best practices.
Run Claimcheck
Point Claimcheck at the mapping and the Claimcheck runs a two-pass round-trip: it first informalizes each lemma back to plain English (without seeing the requirement), then compares the back-translation to the original requirement.
.dfy file:Interpret results and fix disputed claims
Each entry is tagged
confirmed, disputed, or error. Disputed entries include a weakeningType that tells you what kind of fix is needed. See Interpreting results for the full breakdown.Structuring the mapping file
The mapping file is a JSON array of{ requirement, lemmaName } objects.
- One requirement per lemma. Splitting a broad requirement into multiple lemmas makes disputes easier to localize.
- Write requirements before lemmas. Drafting the natural-language requirement first reduces anchoring — you’re less likely to unconsciously shape the requirement to match the lemma you already wrote.
- Use precise language. Vague requirements like “the system behaves correctly” are hard for Claimcheck to evaluate. Prefer specific, observable properties: “the counter never exceeds 100.”
- Keep
lemmaNamein sync. If you rename a lemma in the.dfyfile, update the mapping. A mismatched name produces anerrorresult, not a meaningful audit.
The
--module flag is only needed when using --verify with module-based .dfy files. It is not required for semantic auditing alone.Interpreting results
Each mapping entry resolves to one of three statuses:| Status | Meaning | Action |
|---|---|---|
confirmed | Round-trip passed — the lemma faithfully expresses the requirement | None needed |
disputed | Round-trip failed — discrepancy between lemma meaning and requirement | Fix the lemma or refine the requirement |
error | Lemma not found in the source file | Check that lemmaName matches exactly |
confirmed result means no action is required. The back-translation aligned with the original requirement, so the lemma is doing what it claims.
A disputed result means Claimcheck detected a gap. The output includes a discrepancy field describing what went wrong and a weakeningType field that categorizes the failure. Use the weakeningType to decide how to fix it.
Acting on weakeningType categories
tautology — add a real postcondition
tautology — add a real postcondition
The lemma proves something trivially true, typically because the conclusion was smuggled into the preconditions. The
requires clause contains the same assertion as ensures, making the proof vacuously valid.Fix: Remove the extra requires clause and write a genuine ensures that captures the requirement. If Dafny can’t verify the empty body without the extra precondition, the requirement may not hold in the domain — which is exactly the information you need.weakened-postcondition — strengthen the ensures clause
weakened-postcondition — strengthen the ensures clause
The lemma proves a weaker property than the requirement asks for. The
ensures clause is a genuine consequence of the invariant, but it expresses something narrower or less specific than the original intent.Fix: Replace the ensures clause with one that directly expresses the requirement. This may require discovering or defining the right predicate in your domain spec. If Dafny cannot verify the stronger ensures, the requirement may not be provable from the invariant alone — which is honest and useful information.narrowed-scope — broaden requires or rethink the lemma
narrowed-scope — broaden requires or rethink the lemma
The lemma is valid but only over a subset of the cases the requirement covers. The
requires clause restricts the domain too aggressively, making the lemma true in a narrower scenario than the requirement intends.Fix: Relax or remove the narrowing requires clause and verify whether the property holds over the full intended scope. If it doesn’t, the requirement may need to be split into cases, or the domain invariant may need strengthening.Example from the demo: OrderIrrelevant only proved that prepend and append give the same count (one specific reordering), not that any permutation of ballots produces the same tally. Fix: rewrite using multisets to cover all permutations.wrong-property — rewrite the lemma from scratch
wrong-property — rewrite the lemma from scratch
The lemma proves something unrelated to the requirement. The
ensures clause addresses a different property entirely, not just a weaker or narrower version.Fix: Discard the existing ensures clause and rethink the formalization from scratch, starting from the natural-language requirement. Consider whether the domain spec exposes the right predicates to express the requirement.Example from the demo: TallyMonotonic (requirement: “adding a ballot can’t decrease a tally”) proved Count(...) >= 0 — trivially true since counts are natural numbers. The fix was to compare the tally before and after adding a ballot.Running with —verify
Add--verify to also run dafny verify on each lemma. This checks Dafny correctness in addition to semantic faithfulness.
--verify requires dafny in your PATH. The --module flag is required when using --verify with module-based .dfy files.verify-failed result means the lemma does not compile or the proof is invalid — a separate problem from the semantic audit. Fix Dafny verification failures first, then re-run the semantic check.
Using —single-prompt for richer dispute analysis
By default, Claimcheck uses a two-pass architecture: a fast back-translation model (Haiku) informalizes lemmas without seeing the requirements, and a comparison model (Sonnet) checks alignment. This gives the best accuracy and speed (96.3% at ~108 seconds per run). Add--single-prompt to switch to a single LLM call per pair that performs both passes sequentially and returns richer verdict categories:
confirmed/disputed:
| Verdict | Meaning |
|---|---|
JUSTIFIED | Lemma faithfully expresses the requirement |
PARTIALLY_JUSTIFIED | Partially matches but misses part of the requirement |
NOT_JUSTIFIED | Does not express the requirement |
VACUOUS | Lemma is vacuously true (tautology or empty ensures) |
Integration tips
Running in CI
Add Claimcheck to your CI pipeline with--json to get machine-readable output, then fail the build on any disputed or error results:
Using —json for machine-readable output
--json outputs a JSON object to stdout with domain, timestamp, results, and tokenUsage fields. Each entry in results contains at minimum requirement, lemmaName, and status. Disputed entries also include discrepancy and weakeningType.
Using stdin mode for programmatic use
If you have already extracted lemma code (for example, from a build system or code generation pipeline), use--stdin to pipe pre-extracted claims directly: