Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
RetrievalProver implements retrieval-augmented generation (RAG) for Lean 4 proof search. At each step of the DFS loop it queries a RetrievalAugmentedGenerator — built on kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small — which first retrieves the most relevant premises from a pre-indexed corpus and then generates tactics conditioned on both the goal and those premises. Candidate tactics are sampled proportionally to their softmax-normalized log-probabilities, giving higher-confidence suggestions a better chance of being tried first.
Class
lean_dojo_v2.prover.
Internally wraps lean_dojo_v2.lean_agent.generator.model.RetrievalAugmentedGenerator.
Constructor
gen_ckpt_path, injects the retriever checkpoint from ret_ckpt_path, loads the premise corpus from indexed_corpus_path, and rebuilds the retrieval index with batch_size=32. Construction is blocking and may take a minute or more on first run while the index is built.
Path to the retriever checkpoint produced by a
RetrievalTrainer run (typically a .ckpt file such as retriever-epoch=5.ckpt). This checkpoint is injected into the generator’s retriever module.Path to the combined generator Lightning checkpoint (e.g.
model_lightning.ckpt). Defaults in LeanAgent to os.path.join(RAID_DIR, "model_lightning.ckpt").Path to the
corpus.jsonl file produced by DynamicDatabase.export_merged_data(). This file contains the premise strings that the retriever will index and search at inference time.Hardcoded configuration
The following model hyperparameters are fixed at construction and cannot be overridden without subclassing:| Parameter | Value |
|---|---|
model_name | "kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small" |
num_beams | 5 |
eval_num_retrieved | 10 |
max_inp_seq_len | 512 |
max_oup_seq_len | 128 |
For most workflows, use
LeanAgent rather than constructing RetrievalProver directly. LeanAgent._setup_prover() handles checkpoint path resolution, device selection, and corpus wiring automatically.Methods
next_tactic
BaseProver.search() on every DFS step.
Generation process:
- Calls
tactic_generator.generate(state=..., file_path=..., theorem_full_name=..., theorem_pos=..., num_samples=10)to produce 10(tactic, log_prob)pairs. - Computes
probs = softmax(log_probs)using NumPy. - Samples one tactic using
random.choices(..., weights=probs, k=1).
The Pantograph
GoalState at the current DFS node. Converted to a string and passed to the generator as the state argument.Index of the active goal in
state.goals. Used for interface compatibility; the generator operates on the full goal state string.A single tactic string sampled from the model’s top-10 candidates, or
None if self.theorem has not been set.generate_whole_proof
Not used — calling this method always raises
NotImplementedError.Example
The following shows the typical construction pattern used insideLeanAgent._setup_prover():
prover is a fully initialized BaseProver and can be passed directly to search():
Device Handling
RetrievalProver automatically selects cuda if a GPU is available, otherwise falls back to cpu. There is no device constructor argument; override torch.cuda.is_available() or set CUDA_VISIBLE_DEVICES="" in the environment to force CPU execution.
See Also
- BaseProver — search loop and abstract interface
- HFProver — local HuggingFace model with whole-proof support
- ExternalProver — HF Inference API tactic generator