Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

LeanGitRepo is the fundamental unit of identity throughout LeanDojo v2. It is a frozen dataclass — immutable after construction — that binds a repository URL (GitHub HTTPS, SSH, any remote, or a local path) to a specific commit SHA. Every tracing, caching, and theorem-lookup operation takes a LeanGitRepo as its starting point. Tags and branch names are automatically resolved to full 40-character commit hashes, and SSH URLs (git@github.com:...) are normalised to HTTPS on construction.
from lean_dojo_v2.lean_dojo.data_extraction.lean import LeanGitRepo

Constructor Fields

url
str
required
The repository URL. Accepted formats:
  • GitHub HTTPShttps://github.com/owner/repo (preferred)
  • GitHub SSHgit@github.com:owner/repo (automatically converted to HTTPS)
  • Any other Git remote — any valid git clone-able URL
  • Local filesystem path — an absolute or relative path to a directory that is already a Git repository
Trailing slashes and .git suffixes are stripped during normalisation.
commit
str
required
The commit to pin the repository to. You may supply:
  • A full 40-character hex SHA — "3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2"
  • A tag — "v4.0.0"
  • A branch name — "main"
Tags and branches are resolved to commit hashes during __post_init__, so repo.commit is always a 40-character SHA after construction.
LeanGitRepo is a frozen dataclass (frozen=True). Its fields cannot be reassigned after construction. Internal mutations (e.g., resolving a tag to a commit SHA) use object.__setattr__ during __post_init__.

RepoType Enum

The library classifies repositories into three types, stored in repo.repo_type:
ValueMeaning
RepoType.GITHUBA GitHub HTTPS URL (https://github.com/…). The GitHub REST API is used for metadata.
RepoType.REMOTEAny other publicly accessible Git remote that is not GitHub.
RepoType.LOCALA local directory that is itself a Git repository.
from lean_dojo_v2.lean_dojo.data_extraction.lean import RepoType

repo = LeanGitRepo("https://github.com/durant42040/lean4-example", "3e23ab0...")
assert repo.repo_type == RepoType.GITHUB

Class Method

LeanGitRepo.from_path()

Constructs a LeanGitRepo from a local directory by reading the current HEAD commit from the Git metadata.
path
Union[Path, str]
required
Absolute or relative path to a local Lean 4 Git repository.
return
LeanGitRepo
A LeanGitRepo whose url is the absolute path and whose commit is the resolved HEAD SHA.
local_repo = LeanGitRepo.from_path("/path/to/local/lean/project")
print(local_repo.commit)  # e.g. "a1b2c3d4e5..."

Instance Properties and Methods

name

The final path component of url — equivalent to the repository name on GitHub.
repo = LeanGitRepo("https://github.com/durant42040/lean4-example", "3e23ab0...")
print(repo.name)  # "lean4-example"

is_lean4

Returns True if and only if this repository is the Lean 4 standard library itself (https://github.com/leanprover/lean4). Theorems belonging to this repo are excluded from exported benchmarks.

commit_url

The full URL to the specific commit tree on the hosting service, e.g.:
https://github.com/durant42040/lean4-example/tree/3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2

exists()

Checks whether the repository at this URL and commit is accessible right now.
return
bool
True if the commit URL responds (GitHub) or if the commit hash can be resolved locally (local / remote repos). False otherwise, with a warning logged.

clone_and_checkout()

Clones the repository into the current working directory and checks out the pinned commit, including submodule initialisation.
This method clones into a directory named self.name relative to the current working directory. Ensure you call it from within an appropriate temporary directory to avoid polluting the project root.

get_cache_dirname()

Returns the subdirectory name used by LeanDojo’s cache for this specific repo and commit, e.g.:
durant42040-lean4-example-3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2
return
Path
A single-component Path identifying this repo’s cache slot. Append repo.name to get the full relative cache path.

get_dependencies()

Parses lean-toolchain, lake-manifest.json, and lakefile.lean / lakefile.toml to return the full set of declared dependencies.
path
Optional[Union[str, Path]]
default:"None"
If provided, reads dependency files from this local directory rather than fetching them from the remote.
return
Dict[str, LeanGitRepo]
A mapping from dependency name to its LeanGitRepo. Always includes "lean4" pointing to the required Lean 4 stdlib commit.

Theorem

Identifies a single theorem within a LeanGitRepo. Used as the key type for looking up traced theorems.
FieldTypeDescription
repoLeanGitRepoThe repository containing the theorem.
file_pathPathPath to the .lean source file, relative to the repository root. Must end in .lean.
full_namestrFully qualified Lean 4 name, e.g. MyNamespace.myLemma.
from lean_dojo_v2.lean_dojo.data_extraction.lean import Theorem

thm = Theorem(
    repo=repo,
    file_path=Path("MyProject/Basic.lean"),
    full_name="MyProject.add_comm",
)
print(thm.uid)  # unique string key usable as a dict key or DB identifier

Pos

A 1-indexed line/column position within a .lean source file. Comparable (<, <=), iterable (yields line_nb, column_nb), and hashable.
FieldTypeDescription
line_nbintLine number, 1-indexed.
column_nbintColumn number, 1-indexed.
from lean_dojo_v2.lean_dojo.data_extraction.lean import Pos

start = Pos(10, 1)
end   = Pos(15, 42)
assert start < end
line, col = start       # unpacking via __iter__

LeanFile

Represents an open .lean source file. Provides random access to source text and conversion between byte indices and Pos objects.
Property / MethodDescription
abs_pathAbsolute Path of the file.
num_linesTotal number of lines.
start_pos / end_posPos at the first / last character.
convert_pos(byte_idx)Convert a Lean 4 String.Pos byte index to a Pos.
offset(pos, delta)Advance pos by delta UTF-8 characters.
lean_file[start:end]Slice the source text between two Pos values.
from lean_dojo_v2.lean_dojo.data_extraction.lean import LeanFile, Pos
from pathlib import Path

lf = LeanFile(root_dir=Path("/project").resolve(), path=Path("MyProject/Basic.lean"))
snippet = lf[Pos(3, 1) : Pos(5, 20)]
print(snippet)

Usage Examples

from lean_dojo_v2.lean_dojo.data_extraction.lean import LeanGitRepo

# ── GitHub repository (full commit SHA) ─────────────────────────────────────
repo = LeanGitRepo(
    url="https://github.com/durant42040/lean4-example",
    commit="3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2",
)
print(repo.name)        # "lean4-example"
print(repo.commit_url)  # "https://github.com/durant42040/lean4-example/tree/3e23ab0..."
print(repo.exists())    # True / False

# ── GitHub repository (tag — resolved automatically) ────────────────────────
repo_tagged = LeanGitRepo(
    url="https://github.com/leanprover-community/mathlib4",
    commit="v4.0.0",
)
# repo_tagged.commit is now the full 40-char SHA for v4.0.0

# ── Local repository ────────────────────────────────────────────────────────
local_repo = LeanGitRepo.from_path("/path/to/local/lean/project")
print(local_repo.commit)  # HEAD SHA of the local repo

# ── Inspect dependencies ─────────────────────────────────────────────────────
deps = repo.get_dependencies()
for name, dep_repo in deps.items():
    print(f"{name}: {dep_repo.url} @ {dep_repo.commit[:8]}")

Build docs developers (and LLMs) love