Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

Getting LeanDojo v2 running requires a few more steps than a typical Python package because it bridges two runtimes: a Python process that orchestrates training and proof search, and a Lean 4 kernel (accessed through Pantograph’s RPC server) that validates every tactic. This page walks you through every prerequisite, the two install paths, and all environment variables the framework consumes.

System Requirements

Before installing, make sure your machine meets these requirements:
RequirementMinimum VersionNotes
Python3.11Required by pyproject.toml
CUDA-capable GPUCUDA 12.6 (tested)Required for training and local inference
Git2.25Used by the tracing pipeline to clone repos
elanlatestLean toolchain manager — installs Lean 4
wgetanyUsed by tracing scripts
Disk spacevariesraid/ working directory can grow large — use fast storage

Install elan (Lean Toolchain Manager)

If you do not already have elan installed:
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
source ~/.bashrc   # or restart your shell
The tracing pipeline looks for installed toolchains under ~/.elan/toolchains/. Ensure that the Lean 4 version required by your target repository is available:
elan toolchain install leanprover/lean4:nightly

Install LeanDojo v2

pip install lean-dojo-v2

Install Pantograph (Required)

Pantograph provides the Lean 4 RPC server that all provers communicate with. It must be installed separately because it is built directly from the upstream GitHub repository:
pip install git+https://github.com/stanford-centaur/PyPantograph
Pantograph must be reinstalled whenever Lean upstream changes break the RPC interface. If you encounter PantographError at runtime, reinstall with the same command above.

Install PyTorch with CUDA Support

The core package declares torch>=1.12.0 as a dependency, but a plain pip install will fetch a CPU-only wheel. Install a CUDA-enabled build explicitly — adjust the index URL to match your CUDA driver version:
# CUDA 12.6 (tested)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Install from Source (Development)

If you want to contribute to LeanDojo v2 or need the latest unreleased changes, install directly from the repository:
1

Clone the repository

git clone https://github.com/lean-dojo/LeanDojo-v2.git
cd LeanDojo-v2
2

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate
3

Upgrade pip and install in editable mode with dev extras

pip install --upgrade pip
pip install -e ".[dev]"
The dev extras include pytest, pytest-cov, black, isort, flake8, and mypy.
4

Install Pantograph and CUDA PyTorch

pip install git+https://github.com/stanford-centaur/PyPantograph
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
5

Verify the installation

python -c "import lean_dojo_v2; print('LeanDojo v2 ready')"

Environment Variables

LeanDojo v2 reads several environment variables at import and runtime. Export them in your shell profile (.bashrc, .zshrc, or .env) before running any agent.

Required

GITHUB_ACCESS_TOKEN is validated at import time. If it is missing or invalid, the framework raises a ValueError immediately. The token must have repo and read:org scopes — the tracing pipeline uses the GitHub API extensively to resolve commits, list files, and fetch repository metadata.
export GITHUB_ACCESS_TOKEN=ghp_your_token_here

Optional

VariableDefaultDescription
HF_TOKEN(none)Hugging Face token for gated models (e.g., DeepSeek-Prover-V2 variants). Set this when ExternalAgent or any HF download requires authentication.
CACHE_DIR~/.cache/lean_dojoDirectory where traced repository metadata and AST caches are stored. Point this to fast storage if tracing large repositories.
TMP_DIRsystem tempScratch space used during tracing and dataset generation.
NUM_PROCSmin(cpu_count, 32)Parallelism for trace and dataset generation steps.
TACTIC_CPU_LIMIT1Number of CPU cores allocated per tactic evaluation in Pantograph.
TACTIC_MEMORY_LIMIT"32g"Memory ceiling per tactic evaluation. Raise this for memory-intensive theories.
DISABLE_REMOTE_CACHE(unset)When set (to any value), disables the remote cache and forces all repositories to be built locally from source.
LOAD_USED_PACKAGES_ONLY(unset)When set, only loads dependency files that are actually imported by the target repository, reducing tracing time and disk usage.
# Example .env file
GITHUB_ACCESS_TOKEN=ghp_your_token_here
HF_TOKEN=hf_your_token_here
CACHE_DIR=/fast-ssd/.cache/lean_dojo
NUM_PROCS=16
TACTIC_MEMORY_LIMIT=64g
# DISABLE_REMOTE_CACHE=1      # uncomment to build all repos locally
# LOAD_USED_PACKAGES_ONLY=1   # uncomment to skip unused dependency files

Working Directories

By default, all datasets, model checkpoints, and tracing artifacts are written under a raid/ directory relative to where you run LeanDojo v2. The layout looks like:
raid/
├── data/          # JSONL training datasets
├── cache/         # Lean tracing cache
└── checkpoints/   # Model checkpoints from SFTTrainer / GRPOTrainer
Point raid/ to your fastest available storage — NVMe SSDs or a high-throughput network filesystem. Tracing large Lean repositories and writing training checkpoints can generate tens of gigabytes of data. You can change the root by editing lean_dojo_v2/utils/constants.py or by symlinking raid/ to your preferred location.

Verify Your Setup

Run this quick sanity check after completing the steps above:
import os

# Confirm token is set
assert os.environ.get("GITHUB_ACCESS_TOKEN"), "GITHUB_ACCESS_TOKEN is not set!"

# Import core components
from lean_dojo_v2.agent.hf_agent import HFAgent
from lean_dojo_v2.agent.external_agent import ExternalAgent
from lean_dojo_v2.trainer.sft_trainer import SFTTrainer

print("All imports successful — LeanDojo v2 is ready.")
If the import succeeds without a ValueError, your token is valid and the framework is correctly installed. Proceed to the Quickstart to run your first proof search.

Build docs developers (and LLMs) love