When you need more granular failure analysis than a binary confirmed/disputed verdict, single-prompt mode trades throughput for detail. Instead of two batched calls shared across all lemmas, it makes one API call per requirement-lemma pair and returns a four-way verdict that distinguishes partial justification from outright mismatch, and flags vacuous lemmas separately from lemmas that prove the wrong thing.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/metareflection/claimcheck/llms.txt
Use this file to discover all available pages before exploring further.
How single-prompt mode works
Enable it with the--single-prompt flag:
claude-sonnet-4-5-20250929 (overridable with --model). The prompt asks the model to work in two named passes:
- Pass 1 (within the prompt) — Informalize the lemma in plain English, stating what it guarantees and under what conditions. The model is instructed to do this before reading the natural-language requirement.
- Pass 2 (within the same prompt) — Read the NL requirement and compare. Answer three structured questions: does the ensures clause express the NL claim (Yes / Partially / No), is the guarantee vacuous, and are there surprising restrictions in the requires clauses?
claimcheck field containing the full verdict detail, informalization, and sub-question answers.
Separation: prompt-level vs structural
The key difference from two-pass mode is where the separation happens:| Two-pass mode | Single-prompt mode | |
|---|---|---|
| Separation type | Structural | Prompt-level |
| How it works | Requirement is absent from the Pass 1 prompt entirely | Model is instructed to informalize before reading the requirement |
| Enforcement | Physical — the requirement string is not in the API call | Behavioral — relies on the model following instructions |
| Look-ahead risk | None — model cannot see what it is not given | Possible — nothing prevents the model from considering both artifacts together |
Two-pass vs single-prompt comparison
| Two-pass | Single-prompt | |
|---|---|---|
| API calls per run | 2 (batched) | N (one per pair) |
| Accuracy (benchmarks) | 96.3% (104/108) | 86.1% (31/36) |
| Time per run (36 pairs) | ~108s | ~353s |
| Verdict richness | confirmed / disputed | JUSTIFIED / PARTIALLY_JUSTIFIED / NOT_JUSTIFIED / VACUOUS |
| Vacuity analysis | No (trivial-strength flag only) | Yes |
| Surprising restrictions | No | Yes |
| Default model | haiku (Pass 1) + sonnet (Pass 2) | sonnet |
Benchmark figures are from 5 domains (counter, kanban, colorwheel, canon, delegation-auth) with 36 requirement-lemma pairs including 8 deliberately bogus lemmas. Two-pass had 3 runs; single-prompt had 1 run.
Verdict categories
Single-prompt mode returns one of four verdicts per pair:JUSTIFIED
The lemma’s contract expresses the requirement. The ensures clause may be stronger than the requirement asks — that is acceptable. This is the only passing verdict.
PARTIALLY_JUSTIFIED
The ensures clause covers some but not all of the NL claim. The model identifies what is missing. Maps to
weakened-postcondition in the weakening type taxonomy.NOT_JUSTIFIED
There is a meaningful discrepancy: the lemma is weaker than required, proves something different, or misses key aspects of the requirement.
VACUOUS
The ensures clause already follows trivially from the requires clauses. The lemma proves nothing beyond its own assumptions. Maps to
tautology in the weakening type taxonomy.Vacuity analysis
A vacuous lemma is one whose ensures clause restates a requires clause without unfolding any definitions. For example:VACUOUS, with a vacuousExplanation describing what the ensures clause repeats.
Important distinction: A lemma that extracts a concrete consequence from an invariant is not vacuous. For example:
Inv(m) entails. Single-prompt mode is calibrated not to flag this pattern.
Surprising restrictions
Beyond vacuity, the model checks whether the requires clauses impose preconditions that the NL requirement does not mention. Standard invariant dependencies (e.g.requires Inv(state)) and well-formedness conditions are expected and are not flagged. But a requires clause that meaningfully restricts when the property holds — in a way the requirement does not acknowledge — is reported in the surprisingRestrictions field.
For example, if the requirement says “adding a ballot cannot decrease any candidate’s tally” but the lemma additionally requires |ballots| > 0, that restriction narrows the scope of the guarantee and would be surfaced here.
Output structure
In single-prompt mode each result includes a nestedclaimcheck field with the full structured output from the model. This preserves the richer sub-question answers alongside the mapped comparison shape used for report compatibility.
When to use single-prompt mode
Use single-prompt mode when:- You want to distinguish partial coverage from outright mismatch
- You need explicit vacuity detection with explanations
- You are debugging a disputed result and want the model’s full reasoning chain
- You are running a small number of pairs where the per-call overhead is acceptable
- You are running a large batch and need results quickly
- You want maximum accuracy — two-pass achieves 96.3% vs 86.1% in benchmarks
- Structural separation is important for your trust model
Related
- Two-pass round-trip verification algorithm — how the default mode works and why structural separation matters
- Benchmark results — full accuracy and timing data across all variants