Claimcheck: Verify proof intent, not just proof correctness

Dafny can verify that your proof is mathematically correct — but it cannot verify that your proof says what you intended. Claimcheck fills that gap. It takes your natural-language requirements and your Dafny lemmas, informalizes each lemma back to English without seeing the original requirement, and then checks whether the back-translation matches. If a lemma is a tautology, has a weakened postcondition, or covers a narrower scope than the requirement demands, Claimcheck surfaces it.

Quickstart

Install Claimcheck and audit your first Dafny claims file in minutes.

How it works

Understand the two-pass round-trip algorithm that prevents anchoring bias.

CLI reference

Every flag, option, and output format documented with real examples.

Library API

Use Claimcheck as a Node.js library in your own automation pipelines.

Why Claimcheck?

Dafny proves your code matches your spec. Claimcheck asks whether your spec matches your intent. The gap between them is where bugs hide. A lemma can verify perfectly in Dafny and still be wrong in one of these ways:

Tautology — the postcondition trivially follows from the precondition
Weakened postcondition — the lemma proves something weaker than required
Narrowed scope — the lemma covers a subset of the cases the requirement demands
Wrong property — the lemma proves something entirely different

Install

npm install -g claimcheck

Create a mapping file

Map each natural-language requirement to its Dafny lemma name in a JSON file.

mapping.json

[
  { "requirement": "The counter value is always non-negative", "lemmaName": "CounterNonNegative" }
]

Run the audit

claimcheck -m mapping.json --dfy claims.dfy -d counter

Review results

Each claim is marked confirmed or disputed, with a discrepancy explanation and weakening category for any failures.

Key features

Two-pass round-trip

Structural separation prevents the model from anchoring on the requirement while reading the lemma. Only 2 batched API calls for any number of lemmas.

Single-prompt mode

Richer verdicts — JUSTIFIED, PARTIALLY_JUSTIFIED, NOT_JUSTIFIED, VACUOUS — with vacuity analysis and surprising-restriction detection.

Multiple backends

Anthropic API, Vertex AI, AWS Bedrock, or Claude Code CLI. Override any default model.

Benchmark results

96.3% accuracy over 36 requirement-lemma pairs including 8 deliberately bogus lemmas.

Get Started

How It Works

CLI Reference

Library & Backends

Benchmarks

Guides

Claimcheck: Verify proof intent, not just proof correctness

Quickstart

How it works

CLI reference

Library API

Why Claimcheck?

Key features

Two-pass round-trip

Single-prompt mode

Multiple backends

Benchmark results

Build docs developers (and LLMs) love

Get Started

How It Works

CLI Reference

Library & Backends

Benchmarks

Guides

Documentation Index

Quickstart

How it works

CLI reference

Library API

​Why Claimcheck?

​Key features

Two-pass round-trip

Single-prompt mode

Multiple backends

Benchmark results

Build docs developers (and LLMs) love

Why Claimcheck?

Key features