Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
LeanAgent is the production-grade, lifelong-learning agent introduced in the ICLR 2025 paper. Rather than fine-tuning a generative model with SFT, it trains a retrieval model — kaiyuy/leandojo-lean4-retriever-byt5-small by default — that learns to surface relevant premises from an ever-growing indexed corpus. Repositories are added incrementally, the retriever is updated with Elastic Weight Consolidation (EWC) to avoid catastrophic forgetting, and a separate generation model uses the retrieved premises to construct proofs at evaluation time.
Class Definition
LeanAgent extends BaseAgent. It builds Lean dependencies during tracing (_get_build_deps() returns True) for full premise coverage, and it initialises a RetrievalProver using the latest checkpoint found by find_latest_checkpoint() when prove() is called.
Constructor
Path to the JSON file used for
DynamicDatabase persistence. Shared with the
base class.Training hyperparameters for the retrieval model. Key fields include
model_name (defaults to "kaiyuy/leandojo-lean4-retriever-byt5-small"),
lambda_value for EWC regularisation strength, max_epochs, lr, and
run_progressive_training. Load from YAML with TrainingConfig.from_yaml().Prover runtime settings including
timeout, num_sampled_tactics,
num_workers, and use_vllm. Defaults to a ProverConfig with sensible
out-of-the-box values.Behaviour
| Method / Attribute | Detail |
|---|---|
_get_build_deps() | Always returns True. Lean dependency trees are traced to maximise available premises. |
self.trainer | A RetrievalTrainer(config=self.config) created at construction time. |
_setup_prover() | Calls find_latest_checkpoint() for the retriever path, then constructs RetrievalProver(ret_ckpt_path, gen_ckpt_path, indexed_corpus_path). |
gen_ckpt_path | Resolved as os.path.join(RAID_DIR, "model_lightning.ckpt"). |
indexed_corpus_path | Resolved as os.path.join(self.data_path, "corpus.jsonl"). |
LeanAgent is the pipeline described in the ICLR 2025 paper LeanDojo:
Theorem Proving with Retrieval-Augmented Language Models. It is the
recommended agent for research experiments that require lifelong, progressive
learning across many repositories.Example
The following example is adapted fromexamples/lean_agent.py. It registers two repositories, trains the retrieval model progressively, then attempts to prove all sorry theorems:
Custom Training Configuration
Override specificTrainingConfig fields to tune the EWC regularisation strength, learning rate, or number of training epochs: