LeanDojo v2: AI Theorem Proving for Lean 4

LeanDojo v2 is a Python library that automates the full cycle of AI-assisted formal theorem proving in Lean 4. It handles everything from cloning and tracing Lean repositories, to fine-tuning language models, to driving Pantograph-based proof search — all from a unified Python API.

Quickstart

Get a working theorem prover running in minutes with a real Lean 4 repository.

Installation

Install LeanDojo v2 from PyPI or from source with all required dependencies.

Agents Guide

Learn how HFAgent, LeanAgent, and ExternalAgent orchestrate training and proving.

API Reference

Full class and method reference for every public module in the library.

What LeanDojo v2 Does

LeanDojo v2 automates an otherwise manual loop: trace a Lean repository to extract theorem states, build a training dataset, fine-tune a language model on those states, and then use the model to search for proofs of unsolved (sorry) theorems.

Trace Repositories

Instrument any Lean 4 GitHub repo or local project to extract proof states, tactics, and theorem metadata.

Train Models

Fine-tune HuggingFace causal LMs with SFT, GRPO reinforcement, or retrieval objectives — with optional LoRA.

Prove Theorems

Drive Pantograph-based proof search using local HF checkpoints, retrieval-augmented generation, or external API models.

Architecture Overview

The library is organized into four layers that compose cleanly:

Trace

lean_dojo_v2.lean_dojo instruments a Lean repository with ExtractData.lean, producing .ast.json and tactic trace files for every theorem.

Store

lean_dojo_v2.database.DynamicDatabase stores repository metadata, theorem difficulty, and sorry status, enabling curriculum-ordered training schedules.

Train

lean_dojo_v2.trainer provides SFTTrainer, GRPOTrainer, RetrievalTrainer, and ProgressTrainer — all integrated with HuggingFace Transformers and TRL.

Prove

lean_dojo_v2.prover connects a trained model to Pantograph’s Lean 4 RPC server and runs best-first search to fill in sorry placeholders.

Choose Your Agent

Pick the agent that matches your setup:

HFAgent

Train and prove locally with any HuggingFace causal LM. Supports full fine-tuning and LoRA adapters.

LeanAgent

Lifelong learning pipeline with retrieval-augmented generation (RAG). Builds Lean dependencies and trains a premise retriever.

ExternalAgent

Delegate inference to the HuggingFace Inference API (e.g., DeepSeek-Prover-V2-671B). No local GPU required.

Key Features

Unified agent API — setup_github_repository(), train(), and prove() work the same way across all agents
Three training strategies — SFT, GRPO reinforcement learning, and retrieval-augmented training
LoRA support — pass a peft.LoraConfig to SFTTrainer or GRPOTrainer for memory-efficient fine-tuning
Curriculum learning — DynamicDatabase ranks theorems by difficulty and orders repositories for progressive training
External API — FastAPI server + LeanCopilot Lean frontend for querying LLMs directly from Lean editors
LeanProgress — step-prediction model that estimates remaining proof steps in a proof state

LeanDojo v2 requires Python ≥ 3.11, a CUDA-capable GPU for training, Git ≥ 2.25, and the elan Lean toolchain manager. See the Installation page for full requirements.

Get Started

Concepts

Guides

Reference

LeanDojo v2: AI Theorem Proving for Lean 4

Quickstart

Installation

Agents Guide

API Reference

What LeanDojo v2 Does

Trace Repositories

Train Models

Prove Theorems

Architecture Overview

Choose Your Agent

HFAgent

LeanAgent

ExternalAgent

Key Features

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Documentation Index

Quickstart

Installation

Agents Guide

API Reference

​What LeanDojo v2 Does

Trace Repositories

Train Models

Prove Theorems

​Architecture Overview

​Choose Your Agent

HFAgent

LeanAgent

ExternalAgent

​Key Features

Build docs developers (and LLMs) love

What LeanDojo v2 Does

Architecture Overview

Choose Your Agent

Key Features