Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/metareflection/claimcheck/llms.txt

Use this file to discover all available pages before exploring further.

Dafny can verify that your proof is mathematically correct — but it cannot verify that your proof says what you intended. Claimcheck fills that gap. It takes your natural-language requirements and your Dafny lemmas, informalizes each lemma back to English without seeing the original requirement, and then checks whether the back-translation matches. If a lemma is a tautology, has a weakened postcondition, or covers a narrower scope than the requirement demands, Claimcheck surfaces it.

Quickstart

Install Claimcheck and audit your first Dafny claims file in minutes.

How it works

Understand the two-pass round-trip algorithm that prevents anchoring bias.

CLI reference

Every flag, option, and output format documented with real examples.

Library API

Use Claimcheck as a Node.js library in your own automation pipelines.

Why Claimcheck?

Dafny proves your code matches your spec. Claimcheck asks whether your spec matches your intent. The gap between them is where bugs hide. A lemma can verify perfectly in Dafny and still be wrong in one of these ways:
  • Tautology — the postcondition trivially follows from the precondition
  • Weakened postcondition — the lemma proves something weaker than required
  • Narrowed scope — the lemma covers a subset of the cases the requirement demands
  • Wrong property — the lemma proves something entirely different
1

Install

npm install -g claimcheck
2

Create a mapping file

Map each natural-language requirement to its Dafny lemma name in a JSON file.
mapping.json
[
  { "requirement": "The counter value is always non-negative", "lemmaName": "CounterNonNegative" }
]
3

Run the audit

claimcheck -m mapping.json --dfy claims.dfy -d counter
4

Review results

Each claim is marked confirmed or disputed, with a discrepancy explanation and weakening category for any failures.

Key features

Two-pass round-trip

Structural separation prevents the model from anchoring on the requirement while reading the lemma. Only 2 batched API calls for any number of lemmas.

Single-prompt mode

Richer verdicts — JUSTIFIED, PARTIALLY_JUSTIFIED, NOT_JUSTIFIED, VACUOUS — with vacuity analysis and surprising-restriction detection.

Multiple backends

Anthropic API, Vertex AI, AWS Bedrock, or Claude Code CLI. Override any default model.

Benchmark results

96.3% accuracy over 36 requirement-lemma pairs including 8 deliberately bogus lemmas.

Build docs developers (and LLMs) love