Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/metareflection/claimcheck/llms.txt

Use this file to discover all available pages before exploring further.

Claimcheck bridges the gap that formal verification tools leave open: Dafny can prove a lemma is mathematically correct, but it cannot tell you whether that lemma actually captures what you meant to prove. This page explains the problem Claimcheck solves, how it works, and the failure modes it catches.

The problem: verified doesn’t mean correct

Dafny is a verification language. If your proof compiles, it is mathematically correct — but correct proof does not mean correct meaning. A lemma can verify perfectly and still not say what you think it says.
Dafny proves your code matches your spec. Claimcheck asks whether your spec matches your intent. The gap between them is where bugs hide.
Consider this concrete example from a verified election tally system. The requirement reads:
“Adding a ballot can’t decrease a tally”
An agent writes the following lemma, which Dafny verifies without errors:
lemma TallyMonotonic(ballots: seq<int>)
  ensures Count(ballots) >= 0
{}
The lemma is trivially true — counts are natural numbers and are always non-negative. It says nothing about what happens before versus after adding a ballot. The requirement is about monotonicity; the lemma proves a tautology. Dafny cannot catch this because it does not know your intent. Claimcheck does.

How Claimcheck works

Claimcheck verifies claims via a round-trip back-translation in two passes:
  1. Informalize — An LLM reads the Dafny lemma without seeing the natural-language requirement and translates it back to plain English.
  2. Compare — A second LLM checks whether that back-translation matches the original requirement.
Different models handle each pass, preventing anchoring bias. If the lemma doesn’t say what the requirement says, the mismatch surfaces.

Four failure modes Claimcheck catches

All four of these pass Dafny verification. All four fail Claimcheck.
Failure modeWhat it looks like
tautologyLemma proves something trivially true (e.g., Count(...) >= 0) rather than the stated property
weakened-postconditionLemma proves a weaker form of the guarantee, allowing cases the requirement forbids
narrowed-scopeLemma proves the property for a subset of cases instead of all cases required (e.g., one specific reordering instead of any permutation)
wrong-propertyLemma proves an unrelated property that happens to be provable in the same context
In single-prompt mode, disputed results include a weakeningType field categorizing the failure as one of these four modes, along with a vacuity analysis and description of any surprising restrictions.

What Claimcheck is not

Claimcheck does not replace Dafny. It assumes your lemmas already compile and verify. Its role is to audit whether verified lemmas faithfully formalize the requirements they claim to cover — the intent gap that verification alone cannot close.

Where to go next

Quickstart

Install Claimcheck and run your first audit in three steps.

Two-pass mode

How structural separation between informalize and compare prevents anchoring bias.

Single-prompt mode

One model, two sequential passes, richer verdicts including vacuity analysis.

CLI options

Full reference for all flags, models, backends, and output formats.

Build docs developers (and LLMs) love