Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

BaseAgent is the abstract foundation of the LeanDojo v2 agent hierarchy. Every concrete agent — HFAgent, LeanAgent, and ExternalAgent — extends this class and inherits its full suite of repository management and proving methods. You cannot instantiate BaseAgent directly; instead, subclasses must implement _get_build_deps() and _setup_prover() to wire in their specific training and inference backends.

Class Definition

lean_dojo_v2.agent.base_agent.BaseAgent
BaseAgent is an abstract base class (ABC). It manages a DynamicDatabase instance for persisting repository metadata across runs, maintains an internal list of traced LeanGitRepo objects, and routes the full prove–train–evaluate lifecycle through its concrete subclass implementations.

Constructor

BaseAgent(database_path: str = "dynamic_database.json")
database_path
str
default:"dynamic_database.json"
Path to the JSON file used by DynamicDatabase for persisting repository and theorem metadata between sessions. Relative paths are resolved from the current working directory at runtime.

Abstract Methods

Subclasses must implement both of these methods before BaseAgent will function correctly.

_get_build_deps

@abstractmethod
def _get_build_deps(self) -> bool
Returns True if the agent should build Lean dependencies (i.e., pass build_deps=True to the tracer) when tracing a repository, or False to skip dependency tracing. Building dependencies provides fuller premise coverage but significantly increases tracing time and disk usage.

_setup_prover

@abstractmethod
def _setup_prover(self)
Initialises self.prover with the agent’s specific prover backend (e.g., HFProver, RetrievalProver, or ExternalProver). Called automatically by initialize_prover() after the sorry-theorem list has been collected. Subclasses should assign the constructed prover to self.prover.

Public Methods

setup_github_repository

setup_github_repository(url: str, commit: str)
Clones, traces, and registers a GitHub repository with the agent. Internally calls trace_repository() with build_deps=self._get_build_deps() and then add_repository(). Raises ValueError if tracing fails.
url
str
required
The HTTPS URL of the GitHub repository, e.g. "https://github.com/durant42040/lean4-example".
commit
str
required
The exact commit SHA to trace. Pinning to a specific commit ensures reproducible traces.

setup_local_repository

setup_local_repository(path: str)
Traces a Lean project already present on disk. Constructs a LeanGitRepo from the local path, then delegates to trace_repository() and add_repository(). Raises ValueError if tracing fails.
path
str
required
Absolute or relative path to the root of the local Lean project (the directory that contains lakefile.toml or lakefile.lean).

train

train()
Sorts all registered repositories by difficulty using DynamicDatabase.sort_repositories_by_difficulty(), then calls self.trainer.train(repos=..., database=..., data_path=...). Raises ValueError if no repository has been loaded yet.
Call setup_github_repository() or setup_local_repository() at least once before calling train(), otherwise a ValueError is raised.

evaluate

evaluate()
Delegates directly to self.trainer.evaluate(). Use this after train() to measure retrieval or generation metrics on held-out data.

prove

prove(whole_proof: bool = False)
Collects all unproved sorry theorems across every registered repository by calling initialize_prover(), then iterates over them and calls prove_theorem() on each one.
whole_proof
bool
default:"False"
When True, the prover generates a complete proof in a single forward pass via prover.generate_whole_proof(). When False (default), the prover uses tactic-by-tactic best-first search via prover.search().

prove_theorem

prove_theorem(theorem, repo, whole_proof: bool = False)
Proves a single theorem. When whole_proof=True, calls self.prover.generate_whole_proof(theorem) and prints the result. When whole_proof=False, constructs a Pantograph Server pointing at the traced repository path, then calls self.prover.search(server=server, theorem=theorem, verbose=False).
theorem
TracedTheorem
required
A traced theorem object retrieved from the database’s sorry_theorems_unproved collection.
repo
LeanGitRepo
required
The LeanGitRepo that owns the theorem, used to resolve the traced-repo path and construct the Pantograph server.
whole_proof
bool
default:"False"
Controls whether to use whole-proof generation or tactic search. Passed through to the underlying prover.
success
bool
Returns True if a proof was found, False otherwise. Returns None when whole_proof=True (the proof text is printed directly instead).

initialize_prover

initialize_prover() -> list
Iterates over all registered repositories, collects every unproved sorry theorem from the database, calls _setup_prover() to initialise self.prover, and returns the collected list of (theorem, repo) tuples.
sorry_theorems
list[tuple[TracedTheorem, LeanGitRepo]]
A flat list of (theorem, repo) pairs for every unproved sorry theorem found across all registered repositories.

add_repository

add_repository(repo: Repository)
Adds a Repository object to the DynamicDatabase and appends the corresponding LeanGitRepo to the internal self.repos list. Called automatically by setup_github_repository() and setup_local_repository().
repo
Repository
required
A lean_dojo_v2.database.models.Repository instance, typically returned by trace_repository().

trace_repository

trace_repository(url: str, commit: str, build_deps: bool) -> Repository
Delegates to DynamicDatabase.trace_repository(). Handles caching — if the repository has already been traced with the same settings, the cached Repository is returned immediately without re-tracing.
url
str
required
HTTPS URL of the repository to trace.
commit
str
required
Exact commit SHA to trace.
build_deps
bool
required
Whether to build and trace Lean dependencies alongside the repository itself.
repo
Repository
A populated Repository database model containing all extracted theorems, premises, and metadata. Returns None if tracing fails.

Writing a Custom Agent

Subclass BaseAgent and implement the two abstract methods to integrate your own prover or trainer:
from lean_dojo_v2.agent.base_agent import BaseAgent
from lean_dojo_v2.lean_agent.config import ProverConfig, TrainingConfig


class MyCustomAgent(BaseAgent):
    def __init__(
        self,
        database_path: str = "dynamic_database.json",
        prover_config: ProverConfig = ProverConfig(),
    ):
        super().__init__(database_path)
        self.prover_config = prover_config

    def _get_build_deps(self) -> bool:
        # Return True to trace dependencies, False to skip
        return False

    def _setup_prover(self):
        # Assign your prover to self.prover
        from my_package import MyProver
        self.prover = MyProver(config=self.prover_config)


# Usage
agent = MyCustomAgent()
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2",
)
agent.prove()
If your custom agent also needs training, assign a trainer to self.trainer in __init__. The inherited train() and evaluate() methods will then work without any additional overrides.

Build docs developers (and LLMs) love