Troubleshooting Common LeanDojo v2 Issues

This reference collects the most common problems encountered when installing, configuring, or running LeanDojo v2, along with step-by-step fixes. Each accordion covers a specific failure mode — expand the ones that match your error message or symptom.

If your issue is not listed here, search open issues or file a new one at github.com/lean-dojo/LeanDojo-v2/issues.

ValueError: GITHUB_ACCESS_TOKEN environment variable must be set

constants.py is evaluated the moment any lean_dojo_v2 submodule is imported. If GITHUB_ACCESS_TOKEN is not in the environment at that point, a ValueError is raised and the import aborts.Fix: Export the token before running any Python that imports the package:

export GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
python your_script.py

Or add it to a .env file in your working directory — python-dotenv loads it automatically at import time:

echo "GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx" >> .env

The token must have at minimum repo and read:org scopes. Create one at github.com/settings/tokens.

401 Bad Credentials or GitHub API rate limit exceeded

This happens when GITHUB_ACCESS_TOKEN is set but the token is expired, revoked, has insufficient scopes, or is malformed.Fix:

Verify the token is valid by testing it manually:
```
curl -H "Authorization: token $GITHUB_ACCESS_TOKEN" https://api.github.com/user
```
A successful response returns your GitHub user JSON. A 401 means the token is invalid.
Regenerate the token at github.com/settings/tokens and ensure it has the repo and read:org scopes checked.
If you are hitting rate limits (HTTP 403 with "API rate limit exceeded"), your token is unauthenticated or shared. Authenticated tokens have a limit of 5,000 requests/hour vs. 60/hour for unauthenticated.

Lean tracing fails or hangs indefinitely

Tracing invokes the Lean 4 compiler via lake build internally. Failures here are almost always caused by a missing or incompatible Lean toolchain.Fix:

Confirm elan is installed and on your PATH:
```
elan --version
```
Check that the required toolchain is installed. The toolchain version is specified in the target repo’s lean-toolchain file:
```
cat path/to/repo/lean-toolchain
# e.g., leanprover/lean4:v4.14.0
elan toolchain install leanprover/lean4:v4.14.0
```
List installed toolchains to verify:
```
ls ~/.elan/toolchains/
```
A good timing estimate from trace.py: tracing takes approximately 1.5× the time to compile the repo with lake build. Check compilation first:
```
cd path/to/repo && lake build
```

Missing CUDA libraries or torch.cuda.is_available() returns False

PyTorch is installed but CUDA is not available, causing all models to fall back to CPU (or crash if device="cuda" is forced).Fix:

Check your driver version:
```
nvidia-smi
```

Reinstall PyTorch with the wheel that matches your CUDA version. For CUDA 12.6:

pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu126

For CUDA 12.1:

pip install torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/cu121

Verify after reinstallation:

import torch
print(torch.cuda.is_available())   # True
print(torch.version.cuda)          # e.g., "12.6"

Ensure the CUDA toolkit version in the wheel matches your driver. Driver ≥ 520 supports CUDA 12.x.

Dataset location errors or disk running full

LeanDojo v2 writes all datasets, checkpoints, and traces to <cwd>/raid. On machines with limited root or home partition space, this fills up quickly — especially when tracing large repos like mathlib4.Fix:

Before running anything, symlink raid/ to a high-throughput storage volume:
```
ln -s /mnt/fast-storage/lean-raid ./raid
```

Alternatively, set CACHE_DIR and TMP_DIR to paths on the larger partition:

export CACHE_DIR=/mnt/fast-storage/lean-cache
export TMP_DIR=/mnt/fast-storage/lean-tmp

Monitor usage:
```
du -sh raid/
```
Remove cached traces for specific repos from ~/.cache/lean_dojo/ if you need to reclaim space. The remote cache at https://dl.fbaipublicfiles.com/lean-dojo will repopulate them on the next run.

Pantograph errors or Lean RPC connection failures

HFProver, RetrievalProver, and ExternalProver all depend on PyPantograph for Lean 4 RPC communication. If the Lean version changes or Pantograph falls out of sync, you will see connection errors or RuntimeError from the prover.Fix:Reinstall Pantograph directly from the latest source:

pip install git+https://github.com/stanford-centaur/PyPantograph

Then restart your Python process. Pantograph embeds a compiled Lean binary, so a clean reinstall is always the safest fix after any Lean upstream change.

Do not install Pantograph from PyPI for development work — the GitHub head often contains fixes for the latest Lean nightly toolchains that are not yet published to PyPI.

Tracing is extremely slow

Tracing is compute-bound by Lean compilation. For large repositories the first trace can take hours.Options to speed things up:

Remote cache: LeanDojo automatically checks https://dl.fbaipublicfiles.com/lean-dojo for pre-traced repositories before building locally. Most popular repos are cached. This is enabled by default — do not set DISABLE_REMOTE_CACHE unless you have a specific reason.
Local cache: Once a repo is traced, the result is stored in CACHE_DIR (default ~/.cache/lean_dojo). Subsequent calls to the same (url, commit) are nearly instant.
Parallelism: Increase NUM_PROCS if your machine has spare CPU cores:
```
export NUM_PROCS=24
```
Skip dependency tracing: Keep build_deps=False unless you need full premise coverage for LeanAgent. Tracing dependencies can multiply the total time by 5–10×.

build_deps=True vs build_deps=False — which do I need?

The build_deps parameter controls whether LeanDojo instruments dependency packages (e.g., Mathlib) in addition to the target repository.

Scenario	Recommended setting
`HFAgent` fine-tuning	`False` (default) — faster tracing, sufficient tactic data.
`ExternalAgent` proof generation	`False` — no retrieval corpus needed.
`LeanAgent` lifelong learning	`True` — the retrieval corpus must include all reachable premises.

Using build_deps=True on a repo that depends on mathlib4 can take many hours and tens of gigabytes of disk space. Only enable it when the full premise corpus is genuinely required.

trl or peft version mismatch errors

The trainers (SFTTrainer, GRPOTrainer) depend on trl and peft. Older or mismatched versions can produce AttributeError, ImportError, or silent incorrect behavior.Fix:Ensure your environment meets the minimum versions from pyproject.toml:

pip install "trl>=0.25.1" "peft>=0.17.0"

Verify installed versions:

python -c "import trl, peft; print(trl.__version__, peft.__version__)"

If you installed from source using pip install -e ".[dev]", the constraints in pyproject.toml are applied automatically. A fresh install in a clean virtual environment is the most reliable fix for persistent version conflicts.

HuggingFace model loading errors (DeepSeek or other gated models)

Loading gated models from the HuggingFace Hub requires authentication, and some models additionally require trust_remote_code=True.Fix:

Set HF_TOKEN:

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx

Then log in via the CLI to persist the token locally:

huggingface-cli login

For models that use custom modeling code (e.g., DeepSeek variants), pass trust_remote_code=True when loading manually:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-Prover-V2-7B",
    trust_remote_code=True,
)

Accept the model’s license agreement on the HuggingFace model page if prompted — some gated models block access until the license is accepted through the web UI.
For the HFTacticGenerator class (used in the external API), the HF_TOKEN variable is read directly from os.environ — ensure it is exported in the same shell session that starts the server.

Get Started

Concepts

Guides

Reference

Troubleshooting Common LeanDojo v2 Issues

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Documentation Index

Build docs developers (and LLMs) love