LeanDojo v2 is a Python library that automates the full cycle of AI-assisted formal theorem proving in Lean 4. It handles everything from cloning and tracing Lean repositories, to fine-tuning language models, to driving Pantograph-based proof search — all from a unified Python API.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Get a working theorem prover running in minutes with a real Lean 4 repository.
Installation
Install LeanDojo v2 from PyPI or from source with all required dependencies.
Agents Guide
Learn how HFAgent, LeanAgent, and ExternalAgent orchestrate training and proving.
API Reference
Full class and method reference for every public module in the library.
What LeanDojo v2 Does
LeanDojo v2 automates an otherwise manual loop: trace a Lean repository to extract theorem states, build a training dataset, fine-tune a language model on those states, and then use the model to search for proofs of unsolved (sorry) theorems.
Trace Repositories
Instrument any Lean 4 GitHub repo or local project to extract proof states, tactics, and theorem metadata.
Train Models
Fine-tune HuggingFace causal LMs with SFT, GRPO reinforcement, or retrieval objectives — with optional LoRA.
Prove Theorems
Drive Pantograph-based proof search using local HF checkpoints, retrieval-augmented generation, or external API models.
Architecture Overview
The library is organized into four layers that compose cleanly:Trace
lean_dojo_v2.lean_dojo instruments a Lean repository with ExtractData.lean, producing .ast.json and tactic trace files for every theorem.Store
lean_dojo_v2.database.DynamicDatabase stores repository metadata, theorem difficulty, and sorry status, enabling curriculum-ordered training schedules.Train
lean_dojo_v2.trainer provides SFTTrainer, GRPOTrainer, RetrievalTrainer, and ProgressTrainer — all integrated with HuggingFace Transformers and TRL.Choose Your Agent
Pick the agent that matches your setup:HFAgent
Train and prove locally with any HuggingFace causal LM. Supports full fine-tuning and LoRA adapters.
LeanAgent
Lifelong learning pipeline with retrieval-augmented generation (RAG). Builds Lean dependencies and trains a premise retriever.
ExternalAgent
Delegate inference to the HuggingFace Inference API (e.g., DeepSeek-Prover-V2-671B). No local GPU required.
Key Features
- Unified agent API —
setup_github_repository(),train(), andprove()work the same way across all agents - Three training strategies — SFT, GRPO reinforcement learning, and retrieval-augmented training
- LoRA support — pass a
peft.LoraConfigtoSFTTrainerorGRPOTrainerfor memory-efficient fine-tuning - Curriculum learning —
DynamicDatabaseranks theorems by difficulty and orders repositories for progressive training - External API — FastAPI server + LeanCopilot Lean frontend for querying LLMs directly from Lean editors
- LeanProgress — step-prediction model that estimates remaining proof steps in a proof state
LeanDojo v2 requires Python ≥ 3.11, a CUDA-capable GPU for training, Git ≥ 2.25, and the
elan Lean toolchain manager. See the Installation page for full requirements.