Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

LeanDojo v2 is a Python library that automates the full cycle of AI-assisted formal theorem proving in Lean 4. It handles everything from cloning and tracing Lean repositories, to fine-tuning language models, to driving Pantograph-based proof search — all from a unified Python API.

Quickstart

Get a working theorem prover running in minutes with a real Lean 4 repository.

Installation

Install LeanDojo v2 from PyPI or from source with all required dependencies.

Agents Guide

Learn how HFAgent, LeanAgent, and ExternalAgent orchestrate training and proving.

API Reference

Full class and method reference for every public module in the library.

What LeanDojo v2 Does

LeanDojo v2 automates an otherwise manual loop: trace a Lean repository to extract theorem states, build a training dataset, fine-tune a language model on those states, and then use the model to search for proofs of unsolved (sorry) theorems.

Trace Repositories

Instrument any Lean 4 GitHub repo or local project to extract proof states, tactics, and theorem metadata.

Train Models

Fine-tune HuggingFace causal LMs with SFT, GRPO reinforcement, or retrieval objectives — with optional LoRA.

Prove Theorems

Drive Pantograph-based proof search using local HF checkpoints, retrieval-augmented generation, or external API models.

Architecture Overview

The library is organized into four layers that compose cleanly:
1

Trace

lean_dojo_v2.lean_dojo instruments a Lean repository with ExtractData.lean, producing .ast.json and tactic trace files for every theorem.
2

Store

lean_dojo_v2.database.DynamicDatabase stores repository metadata, theorem difficulty, and sorry status, enabling curriculum-ordered training schedules.
3

Train

lean_dojo_v2.trainer provides SFTTrainer, GRPOTrainer, RetrievalTrainer, and ProgressTrainer — all integrated with HuggingFace Transformers and TRL.
4

Prove

lean_dojo_v2.prover connects a trained model to Pantograph’s Lean 4 RPC server and runs best-first search to fill in sorry placeholders.

Choose Your Agent

Pick the agent that matches your setup:

HFAgent

Train and prove locally with any HuggingFace causal LM. Supports full fine-tuning and LoRA adapters.

LeanAgent

Lifelong learning pipeline with retrieval-augmented generation (RAG). Builds Lean dependencies and trains a premise retriever.

ExternalAgent

Delegate inference to the HuggingFace Inference API (e.g., DeepSeek-Prover-V2-671B). No local GPU required.

Key Features

  • Unified agent APIsetup_github_repository(), train(), and prove() work the same way across all agents
  • Three training strategies — SFT, GRPO reinforcement learning, and retrieval-augmented training
  • LoRA support — pass a peft.LoraConfig to SFTTrainer or GRPOTrainer for memory-efficient fine-tuning
  • Curriculum learningDynamicDatabase ranks theorems by difficulty and orders repositories for progressive training
  • External API — FastAPI server + LeanCopilot Lean frontend for querying LLMs directly from Lean editors
  • LeanProgress — step-prediction model that estimates remaining proof steps in a proof state
LeanDojo v2 requires Python ≥ 3.11, a CUDA-capable GPU for training, Git ≥ 2.25, and the elan Lean toolchain manager. See the Installation page for full requirements.

Build docs developers (and LLMs) love