LeanDojo v2 Provers: HFProver, ExternalProver, Retrieval

Provers are the inference-time engines in LeanDojo v2. They take a theorem and a running Lean environment, and either search for a sequence of tactics that closes all goals or generate a complete proof in a single forward pass. All three backends — HFProver, ExternalProver, and RetrievalProver — inherit from BaseProver, which implements the core DFS search loop using a Pantograph RPC server and a networkx.DiGraph to track the proof tree. You choose a prover based on where your model lives: a local checkpoint, the HuggingFace Inference API, or a trained retrieval-augmented generator.

All provers require Pantograph for Lean RPC communication. Install it directly from source:

pip install git+https://github.com/stanford-centaur/PyPantograph

Reinstall Pantograph whenever Lean upstream changes, as its Lean-side binaries must match the active toolchain exactly.

Proof Strategies

Proof Search
Whole-Proof Generation

Proof search generates tactics one at a time and updates the goal state via Pantograph’s RPC protocol after each step. The search runs a DFS loop for up to 100 steps, attempting up to 6 candidate tactics per goal (max_trials_per_goal = 5; backtracking is forced once trials > 5). If a tactic fails, the failed edge is recorded in the proof graph and the prover backtracks.The guidance function BaseProver.guidance() assigns priorities to open goals by computing 1 / (len(goal.variables) + 1) for each goal, so goals with fewer free variables are explored first.Use proof search when you want fine-grained control over the search process, need to debug which tactics are being tried, or are working with a tactic-generation model (HFProver or RetrievalProver).

from pantograph.server import Server
from lean_dojo_v2.prover import HFProver

server = Server()
prover = HFProver(ckpt_path="outputs-deepseek")

result, used_tactics = prover.search(
    server=server, goal="∀ {p q : Prop}, p ∧ q → q ∧ p", verbose=False
)

print(result)
if result.success:
    for tactic in used_tactics:
        print(tactic)

Whole-proof generation produces an entire Lean 4 proof block in a single forward pass. This is faster than proof search for large models and works well when the model has strong priors about the proof structure.HFProver and ExternalProver both support whole-proof generation. RetrievalProver does not — calling generate_whole_proof() on it raises NotImplementedError.Pass whole_proof=True to agent.prove() or agent.prove_theorem(theorem, repo, whole_proof=True) to use this mode through the agent API. Alternatively, call the prover directly:

from lean_dojo_v2.prover import ExternalProver

theorem = "theorem my_and_comm : ∀ {p q : Prop}, And p q → And q p := by"
prover = ExternalProver()
proof = prover.generate_whole_proof(theorem)
print(proof)

HFProver

HFProver loads a fine-tuned HuggingFace causal language model from a local checkpoint directory. It is the default prover for HFAgent and supports both full-weight checkpoints and LoRA adapters.

Constructor

HFProver(
    ckpt_path: str,         # local directory containing model weights and tokenizer
    use_lora: bool = False, # set True when the checkpoint contains only adapter weights
    device: str = "auto",   # "auto" resolves to CUDA if available, else CPU
)

When use_lora=True, the model is loaded with AutoPeftModelForCausalLM (from the PEFT library) so the adapter layers are merged with the frozen base model at load time.

`next_tactic()`

Formats the current goal state as a three-section prompt (### System / ### User / ### Assistant) and calls model.generate() with:

num_return_sequences=5
temperature=0.7
top_p=0.9
max_new_tokens=64

Strips each generated continuation to the first line, removes any <;> separators, and filters out any tactic equal to "sorry". One tactic is selected at random from the surviving candidates and returned.

`generate_whole_proof()`

Uses the same model but switches to a two-section prompt (### System / ### User) describing the full theorem statement, and generates with max_new_tokens=512, num_return_sequences=1. Returns the decoded text with <;> sequences removed.

Example: proof search with a local checkpoint

from pantograph.server import Server
from lean_dojo_v2.prover import HFProver


def main() -> None:
    server = Server()
    prover = HFProver(ckpt_path="outputs-deepseek")

    result, used_tactics = prover.search(
        server=server, goal="∀ {p q : Prop}, p ∧ q → q ∧ p", verbose=False
    )

    print(result)
    if result.success:
        for tactic in used_tactics:
            print(tactic)


if __name__ == "__main__":
    main()

ExternalProver

ExternalProver delegates all inference to the HuggingFace Inference API (or any compatible hosted endpoint) via the internal HFTacticGenerator client. No local GPU or model weights are required, making it ideal for quick experiments and large-model inference.

Constructor

ExternalProver(
    model_name: str = "deepseek-ai/DeepSeek-Prover-V2-671B:novita",
)

Any model accessible through the HuggingFace Inference API can be substituted. The :novita suffix selects the Novita inference provider.

`next_tactic()` and `generate_whole_proof()`

Both methods delegate directly to HFTacticGenerator:

next_tactic(state, goal_id) — passes the string representation of the goal state to the API and returns the first generated tactic.
generate_whole_proof(theorem) — passes the theorem string and returns the full proof text.

Example: whole-proof generation

from lean_dojo_v2.prover import ExternalProver


def main() -> None:
    theorem = "theorem my_and_comm : ∀ {p q : Prop}, And p q → And q p := by"
    prover = ExternalProver()
    proof = prover.generate_whole_proof(theorem)
    print(proof)


if __name__ == "__main__":
    main()

Set the HF_TOKEN environment variable before using ExternalProver with gated models:

export HF_TOKEN=hf_...

RetrievalProver

RetrievalProver uses a RetrievalAugmentedGenerator (kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small) to generate tactics by first retrieving relevant premises from an indexed corpus and then conditioning the tactic generator on those premises. It is the prover used by LeanAgent.

Constructor

RetrievalProver(
    ret_ckpt_path: str,          # path to the trained PremiseRetriever checkpoint
    gen_ckpt_path: str,          # path to the generator Lightning checkpoint
    indexed_corpus_path: str,    # path to corpus.jsonl for retrieval indexing
)

On construction, the retriever loads the corpus and calls reindex_corpus(batch_size=32) to build the dense index. The generator model is loaded in eval mode with its retriever frozen.

`next_tactic()`

Calls tactic_generator.generate() with num_samples=10, receiving a list of (tactic, log_prob) pairs. Converts log-probabilities to probabilities via softmax (np.exp(log_probs) / np.sum(np.exp(log_probs))) and samples one tactic using the resulting probability distribution as weights.

`generate_whole_proof()`

Not supported. Always raises:

NotImplementedError: RetrievalProver does not support whole proof generation

Use HFProver or ExternalProver if you need whole-proof generation.

Search Algorithm Details

The DFS search loop inside BaseProver.search() builds a networkx.DiGraph where nodes are SearchState objects (goal states) and edges are labelled with the tactic applied and the goal ID it targeted.

Initialise

server.goal_start(goal) creates the root goal state. The initial SearchState is pushed onto a stack and added to the graph as node 0.

Select goal

Pop the top state from the stack. If it is already solved, jump to the path-finding step. Otherwise, select the highest-priority unsolved goal via search_state.next_goal_id. If trials[goal_id] > max_trials_per_goal (i.e. more than 5 attempts have been made), force tactic = None to trigger backtracking.

Generate and apply tactic

Call next_tactic(goal_state, goal_id). If a tactic is returned, apply it with server.goal_tactic(). On success, compute priorities for the new goal state with guidance(), create a new SearchState, and push it onto the stack. On TacticFailure, record the failed edge in the graph and retry.

Backtrack or terminate

If tactic is None (no candidates left or trial limit reached), pop the current state and continue with the next stack entry. If the stack is empty before the step limit, return SearchResult(success=False).

Recover used tactics

On success, call _find_shortest_path_to_solved(), which runs nx.shortest_path(graph, 0, solved_node_id) and extracts the tactic label from each edge along the path. Returns (SearchResult, used_tactics).

The search loop runs for at most 100 steps (max_steps = 100). The SearchResult object carries n_goals_root, duration, success, and steps.

Prover Comparison

Feature	HFProver	ExternalProver	RetrievalProver
Model location	Local checkpoint	HF Inference API	Local Lightning ckpt
LoRA support	✅ (`use_lora=True`)	—	—
Proof search	✅	✅	✅
Whole-proof generation	✅	✅	❌
GPU required	Yes (recommended)	No	Yes (recommended)
Paired agent	`HFAgent`	`ExternalAgent`	`LeanAgent`

Agents

Learn how agents select and configure provers automatically based on the chosen trainer.

Trainers

See how to produce the checkpoints that HFProver and RetrievalProver load at inference time.

Get Started

Concepts

Guides

Reference

LeanDojo v2 Provers: HFProver, ExternalProver, Retrieval

Proof Strategies

HFProver

Constructor

`next_tactic()`

`generate_whole_proof()`

Example: proof search with a local checkpoint

ExternalProver

Constructor

`next_tactic()` and `generate_whole_proof()`

Example: whole-proof generation

RetrievalProver

Constructor

`next_tactic()`

`generate_whole_proof()`

Search Algorithm Details

Prover Comparison

Agents

Trainers

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Documentation Index

​Proof Strategies

​HFProver

​Constructor

​next_tactic()

​generate_whole_proof()

​Example: proof search with a local checkpoint

​ExternalProver

​Constructor

​next_tactic() and generate_whole_proof()

​Example: whole-proof generation

​RetrievalProver

​Constructor

​next_tactic()

​generate_whole_proof()

​Search Algorithm Details

​Prover Comparison

Agents

Trainers

Build docs developers (and LLMs) love

Proof Strategies

HFProver

Constructor

`next_tactic()`

`generate_whole_proof()`

Example: proof search with a local checkpoint

ExternalProver

Constructor

`next_tactic()` and `generate_whole_proof()`

Example: whole-proof generation

RetrievalProver

Constructor

`next_tactic()`

`generate_whole_proof()`

Search Algorithm Details

Prover Comparison