Dafny can verify that your proof is mathematically correct — but it cannot verify that your proof says what you intended. Claimcheck fills that gap. It takes your natural-language requirements and your Dafny lemmas, informalizes each lemma back to English without seeing the original requirement, and then checks whether the back-translation matches. If a lemma is a tautology, has a weakened postcondition, or covers a narrower scope than the requirement demands, Claimcheck surfaces it.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/metareflection/claimcheck/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Install Claimcheck and audit your first Dafny claims file in minutes.
How it works
Understand the two-pass round-trip algorithm that prevents anchoring bias.
CLI reference
Every flag, option, and output format documented with real examples.
Library API
Use Claimcheck as a Node.js library in your own automation pipelines.
Why Claimcheck?
Dafny proves your code matches your spec. Claimcheck asks whether your spec matches your intent. The gap between them is where bugs hide. A lemma can verify perfectly in Dafny and still be wrong in one of these ways:- Tautology — the postcondition trivially follows from the precondition
- Weakened postcondition — the lemma proves something weaker than required
- Narrowed scope — the lemma covers a subset of the cases the requirement demands
- Wrong property — the lemma proves something entirely different
Create a mapping file
Map each natural-language requirement to its Dafny lemma name in a JSON file.
mapping.json
Key features
Two-pass round-trip
Structural separation prevents the model from anchoring on the requirement while reading the lemma. Only 2 batched API calls for any number of lemmas.
Single-prompt mode
Richer verdicts — JUSTIFIED, PARTIALLY_JUSTIFIED, NOT_JUSTIFIED, VACUOUS — with vacuity analysis and surprising-restriction detection.
Multiple backends
Anthropic API, Vertex AI, AWS Bedrock, or Claude Code CLI. Override any default model.
Benchmark results
96.3% accuracy over 36 requirement-lemma pairs including 8 deliberately bogus lemmas.