Claimcheck bridges the gap that formal verification tools leave open: Dafny can prove a lemma is mathematically correct, but it cannot tell you whether that lemma actually captures what you meant to prove. This page explains the problem Claimcheck solves, how it works, and the failure modes it catches.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/metareflection/claimcheck/llms.txt
Use this file to discover all available pages before exploring further.
The problem: verified doesn’t mean correct
Dafny is a verification language. If your proof compiles, it is mathematically correct — but correct proof does not mean correct meaning. A lemma can verify perfectly and still not say what you think it says.Dafny proves your code matches your spec. Claimcheck asks whether your spec matches your intent. The gap between them is where bugs hide.Consider this concrete example from a verified election tally system. The requirement reads:
“Adding a ballot can’t decrease a tally”An agent writes the following lemma, which Dafny verifies without errors:
How Claimcheck works
Claimcheck verifies claims via a round-trip back-translation in two passes:- Informalize — An LLM reads the Dafny lemma without seeing the natural-language requirement and translates it back to plain English.
- Compare — A second LLM checks whether that back-translation matches the original requirement.
Four failure modes Claimcheck catches
All four of these pass Dafny verification. All four fail Claimcheck.| Failure mode | What it looks like |
|---|---|
tautology | Lemma proves something trivially true (e.g., Count(...) >= 0) rather than the stated property |
weakened-postcondition | Lemma proves a weaker form of the guarantee, allowing cases the requirement forbids |
narrowed-scope | Lemma proves the property for a subset of cases instead of all cases required (e.g., one specific reordering instead of any permutation) |
wrong-property | Lemma proves an unrelated property that happens to be provable in the same context |
In single-prompt mode, disputed results include a
weakeningType field categorizing the failure as one of these four modes, along with a vacuity analysis and description of any surprising restrictions.What Claimcheck is not
Claimcheck does not replace Dafny. It assumes your lemmas already compile and verify. Its role is to audit whether verified lemmas faithfully formalize the requirements they claim to cover — the intent gap that verification alone cannot close.Where to go next
Quickstart
Install Claimcheck and run your first audit in three steps.
Two-pass mode
How structural separation between informalize and compare prevents anchoring bias.
Single-prompt mode
One model, two sequential passes, richer verdicts including vacuity analysis.
CLI options
Full reference for all flags, models, backends, and output formats.