LeanDojo v2 is configured entirely through environment variables, which are read at import time fromDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
lean_dojo_v2/utils/constants.py. A .env file in the working directory is supported via python-dotenv — any variable defined there is loaded automatically before the module constants are evaluated.
Required Variables
A GitHub Personal Access Token (PAT) with at minimum the Generate one at github.com/settings/tokens. The token is validated at import time — a missing token raises
repo and read:org scopes. Used by the tracing pipeline to clone private or rate-limited repositories and by DynamicDatabase.discover_repositories to search GitHub’s API.ValueError: GITHUB_ACCESS_TOKEN environment variable must be set.Optional Variables
HuggingFace access token for gated models (e.g.,
deepseek-ai/DeepSeek-Prover-V2-671B). Required by HFTacticGenerator and any HFAgent workflow that pulls a gated model from the Hub.Absolute path to the directory where LeanDojo caches traced repository artifacts. Traced repos are stored here after the first run and reused on subsequent calls to avoid re-tracing the same
(url, commit) pair.Temporary directory for intermediate files created during tracing. Defaults to the OS temp directory when unset. Set this explicitly if your system temp partition is small.
Number of parallel processes used during repository tracing. Capped at
32 regardless of available CPU count. Reduce this on memory-constrained machines.Maximum number of CPUs allocated per tactic execution when interacting with Lean during tracing. Must parse as an integer.
Maximum memory allocated per tactic execution. Must match the pattern
\d+g (digits followed by g). LeanDojo asserts this pattern at import time — an invalid value causes an AssertionError.When this variable is set (to any value), LeanDojo skips downloading pre-traced artifacts from the remote cache at
https://dl.fbaipublicfiles.com/lean-dojo and always builds repos locally. Useful for offline environments or to force a fresh trace.When set, the tracing pipeline loads only the Lean packages that are actually imported by the target repository, rather than all packages declared in
lakefile.lean. Reduces memory usage for repos with large but partially-used dependency trees.External API Server Variables
The following environment variables are read by the external API server (lean_dojo_v2/external_api/python/server.py) at startup. They are not part of constants.py and do not affect the core tracing or training pipeline.
Path (local directory or HuggingFace repo ID) of a fine-tuned See the LeanProgress guide for training instructions.
LeanProgressScorer checkpoint. When set, the external API server attaches a LeanProgressScorer instance to each model bundle and enables steps_remaining / reward fields in /generate responses.Device string passed to
LeanProgressScorer when LEANPROGRESS_MODEL is set. Accepts "cuda", "cpu", or "auto" (auto-selects CUDA if available).Optional prompt template string for
LeanProgressScorer. Must be a Python format string containing {goal}, {tactic}, and {prefix} placeholders. Defaults to the built-in template in leanprogress.py when unset.Directory Layout
All working directories are rooted underRAID_DIR. The constants below are defined in lean_dojo_v2/utils/constants.py and can be changed by editing that file directly (they are not environment-variable-controlled):
| Constant | Value | Description |
|---|---|---|
RAID_DIR | <cwd>/raid | Base working directory for all LeanDojo v2 outputs. |
REPO_DIR | repos_new | Subdirectory under RAID_DIR for cloned repositories. |
DATA_DIR | data | Subdirectory under RAID_DIR for exported datasets and traced artifacts. |
CHECKPOINT_DIR | checkpoints | Subdirectory under RAID_DIR for model checkpoints. |
FISHER_DIR | fisher | Subdirectory under RAID_DIR for Elastic Weight Consolidation (EWC) Fisher matrices. |
<cwd>/raid/data/<repo_name>_<commit>/.
RAID_DIR is always set to os.path.join(os.getcwd(), "raid"). If you need it elsewhere, either cd into the desired directory before launching Python or create a symlink from ./raid to your target volume.| Path | Description |
|---|---|
raid/eval_results.json | Cumulative evaluation results written by agents. |
raid/dynamic_database.json | DynamicDatabase state: repositories, theorems, curricula. |
raid/proof_log.txt | Line-by-line log of proof attempts. |
raid/encountered_theorems.json | Set of theorem identifiers seen across all tracing runs. |
Lean 4 Path Constants
| Constant | Value | Description |
|---|---|---|
LEAN4_PACKAGES_DIR | .lake/packages | Directory where lake stores dependency packages inside a Lean project. |
LEAN4_BUILD_DIR | .lake/build | Directory where compiled .olean files are written. |
LEAN4_URL | https://github.com/leanprover/lean4 | Canonical URL used to identify Lean 4 itself as a repository. |
Ray Configuration
LeanDojo v2 uses Ray for distributed data processing. TheRAY_TMPDIR environment variable is set unconditionally at import time:
lean_dojo_v2 if /tmp is on a small partition:
Version
The package version is defined inconstants.py and available as lean_dojo_v2.utils.constants.__version__: