This reference collects the most common problems encountered when installing, configuring, or running LeanDojo v2, along with step-by-step fixes. Each accordion covers a specific failure mode — expand the ones that match your error message or symptom.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
ValueError: GITHUB_ACCESS_TOKEN environment variable must be set
ValueError: GITHUB_ACCESS_TOKEN environment variable must be set
constants.py is evaluated the moment any lean_dojo_v2 submodule is imported. If GITHUB_ACCESS_TOKEN is not in the environment at that point, a ValueError is raised and the import aborts.Fix: Export the token before running any Python that imports the package:.env file in your working directory — python-dotenv loads it automatically at import time:repo and read:org scopes. Create one at github.com/settings/tokens.401 Bad Credentials or GitHub API rate limit exceeded
401 Bad Credentials or GitHub API rate limit exceeded
GITHUB_ACCESS_TOKEN is set but the token is expired, revoked, has insufficient scopes, or is malformed.Fix:- Verify the token is valid by testing it manually:
A successful response returns your GitHub user JSON. A
401means the token is invalid. - Regenerate the token at github.com/settings/tokens and ensure it has the
repoandread:orgscopes checked. - If you are hitting rate limits (HTTP 403 with
"API rate limit exceeded"), your token is unauthenticated or shared. Authenticated tokens have a limit of 5,000 requests/hour vs. 60/hour for unauthenticated.
Lean tracing fails or hangs indefinitely
Lean tracing fails or hangs indefinitely
lake build internally. Failures here are almost always caused by a missing or incompatible Lean toolchain.Fix:- Confirm
elanis installed and on yourPATH: - Check that the required toolchain is installed. The toolchain version is specified in the target repo’s
lean-toolchainfile: - List installed toolchains to verify:
- A good timing estimate from
trace.py: tracing takes approximately 1.5× the time to compile the repo withlake build. Check compilation first:
Missing CUDA libraries or torch.cuda.is_available() returns False
Missing CUDA libraries or torch.cuda.is_available() returns False
device="cuda" is forced).Fix:- Check your driver version:
- Reinstall PyTorch with the wheel that matches your CUDA version. For CUDA 12.6:
For CUDA 12.1:
- Verify after reinstallation:
- Ensure the CUDA toolkit version in the wheel matches your driver. Driver ≥ 520 supports CUDA 12.x.
Dataset location errors or disk running full
Dataset location errors or disk running full
<cwd>/raid. On machines with limited root or home partition space, this fills up quickly — especially when tracing large repos like mathlib4.Fix:- Before running anything, symlink
raid/to a high-throughput storage volume: - Alternatively, set
CACHE_DIRandTMP_DIRto paths on the larger partition: - Monitor usage:
- Remove cached traces for specific repos from
~/.cache/lean_dojo/if you need to reclaim space. The remote cache athttps://dl.fbaipublicfiles.com/lean-dojowill repopulate them on the next run.
Pantograph errors or Lean RPC connection failures
Pantograph errors or Lean RPC connection failures
HFProver, RetrievalProver, and ExternalProver all depend on PyPantograph for Lean 4 RPC communication. If the Lean version changes or Pantograph falls out of sync, you will see connection errors or RuntimeError from the prover.Fix:Reinstall Pantograph directly from the latest source:Tracing is extremely slow
Tracing is extremely slow
-
Remote cache: LeanDojo automatically checks
https://dl.fbaipublicfiles.com/lean-dojofor pre-traced repositories before building locally. Most popular repos are cached. This is enabled by default — do not setDISABLE_REMOTE_CACHEunless you have a specific reason. -
Local cache: Once a repo is traced, the result is stored in
CACHE_DIR(default~/.cache/lean_dojo). Subsequent calls to the same(url, commit)are nearly instant. -
Parallelism: Increase
NUM_PROCSif your machine has spare CPU cores: -
Skip dependency tracing: Keep
build_deps=Falseunless you need full premise coverage forLeanAgent. Tracing dependencies can multiply the total time by 5–10×.
build_deps=True vs build_deps=False — which do I need?
build_deps=True vs build_deps=False — which do I need?
build_deps parameter controls whether LeanDojo instruments dependency packages (e.g., Mathlib) in addition to the target repository.| Scenario | Recommended setting |
|---|---|
HFAgent fine-tuning | False (default) — faster tracing, sufficient tactic data. |
ExternalAgent proof generation | False — no retrieval corpus needed. |
LeanAgent lifelong learning | True — the retrieval corpus must include all reachable premises. |
build_deps=True on a repo that depends on mathlib4 can take many hours and tens of gigabytes of disk space. Only enable it when the full premise corpus is genuinely required.trl or peft version mismatch errors
trl or peft version mismatch errors
SFTTrainer, GRPOTrainer) depend on trl and peft. Older or mismatched versions can produce AttributeError, ImportError, or silent incorrect behavior.Fix:Ensure your environment meets the minimum versions from pyproject.toml:pip install -e ".[dev]", the constraints in pyproject.toml are applied automatically. A fresh install in a clean virtual environment is the most reliable fix for persistent version conflicts.HuggingFace model loading errors (DeepSeek or other gated models)
HuggingFace model loading errors (DeepSeek or other gated models)
trust_remote_code=True.Fix:-
Set
HF_TOKEN:Then log in via the CLI to persist the token locally: -
For models that use custom modeling code (e.g., DeepSeek variants), pass
trust_remote_code=Truewhen loading manually: - Accept the model’s license agreement on the HuggingFace model page if prompted — some gated models block access until the license is accepted through the web UI.
-
For the
HFTacticGeneratorclass (used in the external API), theHF_TOKENvariable is read directly fromos.environ— ensure it is exported in the same shell session that starts the server.