Install LeanDojo v2 and Configure Your Environment

Getting LeanDojo v2 running requires a few more steps than a typical Python package because it bridges two runtimes: a Python process that orchestrates training and proof search, and a Lean 4 kernel (accessed through Pantograph’s RPC server) that validates every tactic. This page walks you through every prerequisite, the two install paths, and all environment variables the framework consumes.

System Requirements

Before installing, make sure your machine meets these requirements:

Requirement	Minimum Version	Notes
Python	3.11	Required by `pyproject.toml`
CUDA-capable GPU	CUDA 12.6 (tested)	Required for training and local inference
Git	2.25	Used by the tracing pipeline to clone repos
elan	latest	Lean toolchain manager — installs Lean 4
wget	any	Used by tracing scripts
Disk space	varies	`raid/` working directory can grow large — use fast storage

Install elan (Lean Toolchain Manager)

If you do not already have elan installed:

curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh
source ~/.bashrc   # or restart your shell

The tracing pipeline looks for installed toolchains under ~/.elan/toolchains/. Ensure that the Lean 4 version required by your target repository is available:

elan toolchain install leanprover/lean4:nightly

Install LeanDojo v2

pip install lean-dojo-v2

Install Pantograph (Required)

Pantograph provides the Lean 4 RPC server that all provers communicate with. It must be installed separately because it is built directly from the upstream GitHub repository:

pip install git+https://github.com/stanford-centaur/PyPantograph

Pantograph must be reinstalled whenever Lean upstream changes break the RPC interface. If you encounter PantographError at runtime, reinstall with the same command above.

Install PyTorch with CUDA Support

The core package declares torch>=1.12.0 as a dependency, but a plain pip install will fetch a CPU-only wheel. Install a CUDA-enabled build explicitly — adjust the index URL to match your CUDA driver version:

# CUDA 12.6 (tested)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Install from Source (Development)

If you want to contribute to LeanDojo v2 or need the latest unreleased changes, install directly from the repository:

Clone the repository

git clone https://github.com/lean-dojo/LeanDojo-v2.git
cd LeanDojo-v2

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate

Upgrade pip and install in editable mode with dev extras

pip install --upgrade pip
pip install -e ".[dev]"

The dev extras include pytest, pytest-cov, black, isort, flake8, and mypy.

Install Pantograph and CUDA PyTorch

pip install git+https://github.com/stanford-centaur/PyPantograph
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Verify the installation

python -c "import lean_dojo_v2; print('LeanDojo v2 ready')"

Environment Variables

LeanDojo v2 reads several environment variables at import and runtime. Export them in your shell profile (.bashrc, .zshrc, or .env) before running any agent.

Required

GITHUB_ACCESS_TOKEN is validated at import time. If it is missing or invalid, the framework raises a ValueError immediately. The token must have repo and read:org scopes — the tracing pipeline uses the GitHub API extensively to resolve commits, list files, and fetch repository metadata.

export GITHUB_ACCESS_TOKEN=ghp_your_token_here

Optional

Variable	Default	Description
`HF_TOKEN`	(none)	Hugging Face token for gated models (e.g., DeepSeek-Prover-V2 variants). Set this when `ExternalAgent` or any HF download requires authentication.
`CACHE_DIR`	`~/.cache/lean_dojo`	Directory where traced repository metadata and AST caches are stored. Point this to fast storage if tracing large repositories.
`TMP_DIR`	system temp	Scratch space used during tracing and dataset generation.
`NUM_PROCS`	`min(cpu_count, 32)`	Parallelism for trace and dataset generation steps.
`TACTIC_CPU_LIMIT`	`1`	Number of CPU cores allocated per tactic evaluation in Pantograph.
`TACTIC_MEMORY_LIMIT`	`"32g"`	Memory ceiling per tactic evaluation. Raise this for memory-intensive theories.
`DISABLE_REMOTE_CACHE`	(unset)	When set (to any value), disables the remote cache and forces all repositories to be built locally from source.
`LOAD_USED_PACKAGES_ONLY`	(unset)	When set, only loads dependency files that are actually imported by the target repository, reducing tracing time and disk usage.

# Example .env file
GITHUB_ACCESS_TOKEN=ghp_your_token_here
HF_TOKEN=hf_your_token_here
CACHE_DIR=/fast-ssd/.cache/lean_dojo
NUM_PROCS=16
TACTIC_MEMORY_LIMIT=64g
# DISABLE_REMOTE_CACHE=1      # uncomment to build all repos locally
# LOAD_USED_PACKAGES_ONLY=1   # uncomment to skip unused dependency files

Working Directories

By default, all datasets, model checkpoints, and tracing artifacts are written under a raid/ directory relative to where you run LeanDojo v2. The layout looks like:

raid/
├── data/          # JSONL training datasets
├── cache/         # Lean tracing cache
└── checkpoints/   # Model checkpoints from SFTTrainer / GRPOTrainer

Point raid/ to your fastest available storage — NVMe SSDs or a high-throughput network filesystem. Tracing large Lean repositories and writing training checkpoints can generate tens of gigabytes of data. You can change the root by editing lean_dojo_v2/utils/constants.py or by symlinking raid/ to your preferred location.

Verify Your Setup

Run this quick sanity check after completing the steps above:

import os

# Confirm token is set
assert os.environ.get("GITHUB_ACCESS_TOKEN"), "GITHUB_ACCESS_TOKEN is not set!"

# Import core components
from lean_dojo_v2.agent.hf_agent import HFAgent
from lean_dojo_v2.agent.external_agent import ExternalAgent
from lean_dojo_v2.trainer.sft_trainer import SFTTrainer

print("All imports successful — LeanDojo v2 is ready.")

If the import succeeds without a ValueError, your token is valid and the framework is correctly installed. Proceed to the Quickstart to run your first proof search.

Get Started

Concepts

Guides

Reference

Install LeanDojo v2 and Configure Your Environment

System Requirements

Install elan (Lean Toolchain Manager)

Install LeanDojo v2

Install Pantograph (Required)

Install PyTorch with CUDA Support

Install from Source (Development)

Environment Variables

Required

Optional

Working Directories

Verify Your Setup

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Documentation Index

​System Requirements

​Install elan (Lean Toolchain Manager)

​Install LeanDojo v2

​Install Pantograph (Required)

​Install PyTorch with CUDA Support

​Install from Source (Development)

​Environment Variables

​Required

​Optional

​Working Directories

​Verify Your Setup

Build docs developers (and LLMs) love

System Requirements

Install elan (Lean Toolchain Manager)

Install LeanDojo v2

Install Pantograph (Required)

Install PyTorch with CUDA Support

Install from Source (Development)

Environment Variables

Required

Optional

Working Directories

Verify Your Setup