Claimcheck: Close the gap between proof and intent

Claimcheck bridges the gap that formal verification tools leave open: Dafny can prove a lemma is mathematically correct, but it cannot tell you whether that lemma actually captures what you meant to prove. This page explains the problem Claimcheck solves, how it works, and the failure modes it catches.

The problem: verified doesn’t mean correct

Dafny is a verification language. If your proof compiles, it is mathematically correct — but correct proof does not mean correct meaning. A lemma can verify perfectly and still not say what you think it says.

Dafny proves your code matches your spec. Claimcheck asks whether your spec matches your intent. The gap between them is where bugs hide.

Consider this concrete example from a verified election tally system. The requirement reads:

“Adding a ballot can’t decrease a tally”

An agent writes the following lemma, which Dafny verifies without errors:

lemma TallyMonotonic(ballots: seq<int>)
  ensures Count(ballots) >= 0
{}

The lemma is trivially true — counts are natural numbers and are always non-negative. It says nothing about what happens before versus after adding a ballot. The requirement is about monotonicity; the lemma proves a tautology. Dafny cannot catch this because it does not know your intent. Claimcheck does.

How Claimcheck works

Claimcheck verifies claims via a round-trip back-translation in two passes:

Informalize — An LLM reads the Dafny lemma without seeing the natural-language requirement and translates it back to plain English.
Compare — A second LLM checks whether that back-translation matches the original requirement.

Different models handle each pass, preventing anchoring bias. If the lemma doesn’t say what the requirement says, the mismatch surfaces.

Four failure modes Claimcheck catches

All four of these pass Dafny verification. All four fail Claimcheck.

Failure mode	What it looks like
`tautology`	Lemma proves something trivially true (e.g., `Count(...) >= 0`) rather than the stated property
`weakened-postcondition`	Lemma proves a weaker form of the guarantee, allowing cases the requirement forbids
`narrowed-scope`	Lemma proves the property for a subset of cases instead of all cases required (e.g., one specific reordering instead of any permutation)
`wrong-property`	Lemma proves an unrelated property that happens to be provable in the same context

In single-prompt mode, disputed results include a weakeningType field categorizing the failure as one of these four modes, along with a vacuity analysis and description of any surprising restrictions.

What Claimcheck is not

Claimcheck does not replace Dafny. It assumes your lemmas already compile and verify. Its role is to audit whether verified lemmas faithfully formalize the requirements they claim to cover — the intent gap that verification alone cannot close.

Where to go next

Quickstart

Install Claimcheck and run your first audit in three steps.

Two-pass mode

How structural separation between informalize and compare prevents anchoring bias.

Single-prompt mode

One model, two sequential passes, richer verdicts including vacuity analysis.

CLI options

Full reference for all flags, models, backends, and output formats.

Get Started

How It Works

CLI Reference

Library & Backends

Benchmarks

Guides

Claimcheck: Close the gap between proof and intent

The problem: verified doesn’t mean correct

How Claimcheck works

Four failure modes Claimcheck catches

What Claimcheck is not

Where to go next

Quickstart

Two-pass mode

Single-prompt mode

CLI options

Build docs developers (and LLMs) love

Get Started

How It Works

CLI Reference

Library & Backends

Benchmarks

Guides

Documentation Index

​The problem: verified doesn’t mean correct

​How Claimcheck works

​Four failure modes Claimcheck catches

​What Claimcheck is not

​Where to go next

Quickstart

Two-pass mode

Single-prompt mode

CLI options

Build docs developers (and LLMs) love

The problem: verified doesn’t mean correct

How Claimcheck works

Four failure modes Claimcheck catches

What Claimcheck is not

Where to go next