Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

LeanDojo v2 is configured entirely through environment variables, which are read at import time from lean_dojo_v2/utils/constants.py. A .env file in the working directory is supported via python-dotenv — any variable defined there is loaded automatically before the module constants are evaluated.
# .env — loaded automatically at import time
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
CACHE_DIR=/mnt/fast-storage/lean-cache
NUM_PROCS=16
constants.py is evaluated as soon as any lean_dojo_v2 submodule is imported. If GITHUB_ACCESS_TOKEN is not set at that point, a ValueError is raised immediately and the import fails. Export the token before running any Python code that imports lean_dojo_v2.

Required Variables

GITHUB_ACCESS_TOKEN
string
required
A GitHub Personal Access Token (PAT) with at minimum the repo and read:org scopes. Used by the tracing pipeline to clone private or rate-limited repositories and by DynamicDatabase.discover_repositories to search GitHub’s API.
export GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
Generate one at github.com/settings/tokens. The token is validated at import time — a missing token raises ValueError: GITHUB_ACCESS_TOKEN environment variable must be set.

Optional Variables

HF_TOKEN
string
HuggingFace access token for gated models (e.g., deepseek-ai/DeepSeek-Prover-V2-671B). Required by HFTacticGenerator and any HFAgent workflow that pulls a gated model from the Hub.
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxx
CACHE_DIR
string
default:"~/.cache/lean_dojo"
Absolute path to the directory where LeanDojo caches traced repository artifacts. Traced repos are stored here after the first run and reused on subsequent calls to avoid re-tracing the same (url, commit) pair.
export CACHE_DIR=/mnt/fast-storage/lean-cache
TMP_DIR
string
default:"system temp (None)"
Temporary directory for intermediate files created during tracing. Defaults to the OS temp directory when unset. Set this explicitly if your system temp partition is small.
export TMP_DIR=/scratch/lean-tmp
NUM_PROCS
integer
default:"min(cpu_count, 32)"
Number of parallel processes used during repository tracing. Capped at 32 regardless of available CPU count. Reduce this on memory-constrained machines.
export NUM_PROCS=8
TACTIC_CPU_LIMIT
integer
default:"1"
Maximum number of CPUs allocated per tactic execution when interacting with Lean during tracing. Must parse as an integer.
export TACTIC_CPU_LIMIT=2
TACTIC_MEMORY_LIMIT
string
default:"32g"
Maximum memory allocated per tactic execution. Must match the pattern \d+g (digits followed by g). LeanDojo asserts this pattern at import time — an invalid value causes an AssertionError.
export TACTIC_MEMORY_LIMIT=64g
DISABLE_REMOTE_CACHE
flag
When this variable is set (to any value), LeanDojo skips downloading pre-traced artifacts from the remote cache at https://dl.fbaipublicfiles.com/lean-dojo and always builds repos locally. Useful for offline environments or to force a fresh trace.
export DISABLE_REMOTE_CACHE=1
LOAD_USED_PACKAGES_ONLY
flag
When set, the tracing pipeline loads only the Lean packages that are actually imported by the target repository, rather than all packages declared in lakefile.lean. Reduces memory usage for repos with large but partially-used dependency trees.
export LOAD_USED_PACKAGES_ONLY=1

External API Server Variables

The following environment variables are read by the external API server (lean_dojo_v2/external_api/python/server.py) at startup. They are not part of constants.py and do not affect the core tracing or training pipeline.
LEANPROGRESS_MODEL
string
Path (local directory or HuggingFace repo ID) of a fine-tuned LeanProgressScorer checkpoint. When set, the external API server attaches a LeanProgressScorer instance to each model bundle and enables steps_remaining / reward fields in /generate responses.
export LEANPROGRESS_MODEL=outputs-progress
See the LeanProgress guide for training instructions.
LEANPROGRESS_DEVICE
string
default:"auto"
Device string passed to LeanProgressScorer when LEANPROGRESS_MODEL is set. Accepts "cuda", "cpu", or "auto" (auto-selects CUDA if available).
export LEANPROGRESS_DEVICE=cuda
LEANPROGRESS_TEMPLATE
string
Optional prompt template string for LeanProgressScorer. Must be a Python format string containing {goal}, {tactic}, and {prefix} placeholders. Defaults to the built-in template in leanprogress.py when unset.
export LEANPROGRESS_TEMPLATE="Goal:\n{goal}\n\nPrefix:\n{prefix}\n\nCandidate tactic:\n{tactic}"

Directory Layout

All working directories are rooted under RAID_DIR. The constants below are defined in lean_dojo_v2/utils/constants.py and can be changed by editing that file directly (they are not environment-variable-controlled):
ConstantValueDescription
RAID_DIR<cwd>/raidBase working directory for all LeanDojo v2 outputs.
REPO_DIRrepos_newSubdirectory under RAID_DIR for cloned repositories.
DATA_DIRdataSubdirectory under RAID_DIR for exported datasets and traced artifacts.
CHECKPOINT_DIRcheckpointsSubdirectory under RAID_DIR for model checkpoints.
FISHER_DIRfisherSubdirectory under RAID_DIR for Elastic Weight Consolidation (EWC) Fisher matrices.
The full path for, e.g., a dataset is: <cwd>/raid/data/<repo_name>_<commit>/.
RAID_DIR is always set to os.path.join(os.getcwd(), "raid"). If you need it elsewhere, either cd into the desired directory before launching Python or create a symlink from ./raid to your target volume.
Well-known file paths derived from these constants:
PathDescription
raid/eval_results.jsonCumulative evaluation results written by agents.
raid/dynamic_database.jsonDynamicDatabase state: repositories, theorems, curricula.
raid/proof_log.txtLine-by-line log of proof attempts.
raid/encountered_theorems.jsonSet of theorem identifiers seen across all tracing runs.

Lean 4 Path Constants

ConstantValueDescription
LEAN4_PACKAGES_DIR.lake/packagesDirectory where lake stores dependency packages inside a Lean project.
LEAN4_BUILD_DIR.lake/buildDirectory where compiled .olean files are written.
LEAN4_URLhttps://github.com/leanprover/lean4Canonical URL used to identify Lean 4 itself as a repository.

Ray Configuration

LeanDojo v2 uses Ray for distributed data processing. The RAY_TMPDIR environment variable is set unconditionally at import time:
os.environ["RAY_TMPDIR"] = "/tmp/ray"
Override it before importing lean_dojo_v2 if /tmp is on a small partition:
export RAY_TMPDIR=/scratch/ray
python your_script.py

Version

The package version is defined in constants.py and available as lean_dojo_v2.utils.constants.__version__:
from lean_dojo_v2.utils.constants import __version__
print(__version__)  # e.g., "4.20.0"

Build docs developers (and LLMs) love