BaseAgent: Abstract Base for Theorem Proving Agents

BaseAgent is the abstract foundation of the LeanDojo v2 agent hierarchy. Every concrete agent — HFAgent, LeanAgent, and ExternalAgent — extends this class and inherits its full suite of repository management and proving methods. You cannot instantiate BaseAgent directly; instead, subclasses must implement _get_build_deps() and _setup_prover() to wire in their specific training and inference backends.

Class Definition

lean_dojo_v2.agent.base_agent.BaseAgent

BaseAgent is an abstract base class (ABC). It manages a DynamicDatabase instance for persisting repository metadata across runs, maintains an internal list of traced LeanGitRepo objects, and routes the full prove–train–evaluate lifecycle through its concrete subclass implementations.

Constructor

BaseAgent(database_path: str = "dynamic_database.json")

database_path

str

default:"dynamic_database.json"

Path to the JSON file used by DynamicDatabase for persisting repository and theorem metadata between sessions. Relative paths are resolved from the current working directory at runtime.

Abstract Methods

Subclasses must implement both of these methods before BaseAgent will function correctly.

`_get_build_deps`

@abstractmethod
def _get_build_deps(self) -> bool

Returns True if the agent should build Lean dependencies (i.e., pass build_deps=True to the tracer) when tracing a repository, or False to skip dependency tracing. Building dependencies provides fuller premise coverage but significantly increases tracing time and disk usage.

`_setup_prover`

@abstractmethod
def _setup_prover(self)

Initialises self.prover with the agent’s specific prover backend (e.g., HFProver, RetrievalProver, or ExternalProver). Called automatically by initialize_prover() after the sorry-theorem list has been collected. Subclasses should assign the constructed prover to self.prover.

Public Methods

`setup_github_repository`

setup_github_repository(url: str, commit: str)

Clones, traces, and registers a GitHub repository with the agent. Internally calls trace_repository() with build_deps=self._get_build_deps() and then add_repository(). Raises ValueError if tracing fails.

url

str

required

The HTTPS URL of the GitHub repository, e.g. "https://github.com/durant42040/lean4-example".

commit

str

required

The exact commit SHA to trace. Pinning to a specific commit ensures reproducible traces.

`setup_local_repository`

setup_local_repository(path: str)

Traces a Lean project already present on disk. Constructs a LeanGitRepo from the local path, then delegates to trace_repository() and add_repository(). Raises ValueError if tracing fails.

path

str

required

Absolute or relative path to the root of the local Lean project (the directory that contains lakefile.toml or lakefile.lean).

`train`

train()

Sorts all registered repositories by difficulty using DynamicDatabase.sort_repositories_by_difficulty(), then calls self.trainer.train(repos=..., database=..., data_path=...). Raises ValueError if no repository has been loaded yet.

Call setup_github_repository() or setup_local_repository() at least once before calling train(), otherwise a ValueError is raised.

`evaluate`

evaluate()

Delegates directly to self.trainer.evaluate(). Use this after train() to measure retrieval or generation metrics on held-out data.

`prove`

prove(whole_proof: bool = False)

Collects all unproved sorry theorems across every registered repository by calling initialize_prover(), then iterates over them and calls prove_theorem() on each one.

whole_proof

bool

default:"False"

When True, the prover generates a complete proof in a single forward pass via prover.generate_whole_proof(). When False (default), the prover uses tactic-by-tactic best-first search via prover.search().

`prove_theorem`

prove_theorem(theorem, repo, whole_proof: bool = False)

Proves a single theorem. When whole_proof=True, calls self.prover.generate_whole_proof(theorem) and prints the result. When whole_proof=False, constructs a Pantograph Server pointing at the traced repository path, then calls self.prover.search(server=server, theorem=theorem, verbose=False).

theorem

TracedTheorem

required

A traced theorem object retrieved from the database’s sorry_theorems_unproved collection.

repo

LeanGitRepo

required

The LeanGitRepo that owns the theorem, used to resolve the traced-repo path and construct the Pantograph server.

whole_proof

bool

default:"False"

Controls whether to use whole-proof generation or tactic search. Passed through to the underlying prover.

success

bool

Returns True if a proof was found, False otherwise. Returns None when whole_proof=True (the proof text is printed directly instead).

`initialize_prover`

initialize_prover() -> list

Iterates over all registered repositories, collects every unproved sorry theorem from the database, calls _setup_prover() to initialise self.prover, and returns the collected list of (theorem, repo) tuples.

sorry_theorems

list[tuple[TracedTheorem, LeanGitRepo]]

A flat list of (theorem, repo) pairs for every unproved sorry theorem found across all registered repositories.

`add_repository`

add_repository(repo: Repository)

Adds a Repository object to the DynamicDatabase and appends the corresponding LeanGitRepo to the internal self.repos list. Called automatically by setup_github_repository() and setup_local_repository().

repo

Repository

required

A lean_dojo_v2.database.models.Repository instance, typically returned by trace_repository().

`trace_repository`

trace_repository(url: str, commit: str, build_deps: bool) -> Repository

Delegates to DynamicDatabase.trace_repository(). Handles caching — if the repository has already been traced with the same settings, the cached Repository is returned immediately without re-tracing.

url

str

required

HTTPS URL of the repository to trace.

commit

str

required

Exact commit SHA to trace.

build_deps

bool

required

Whether to build and trace Lean dependencies alongside the repository itself.

repo

Repository

A populated Repository database model containing all extracted theorems, premises, and metadata. Returns None if tracing fails.

Writing a Custom Agent

Subclass BaseAgent and implement the two abstract methods to integrate your own prover or trainer:

from lean_dojo_v2.agent.base_agent import BaseAgent
from lean_dojo_v2.lean_agent.config import ProverConfig, TrainingConfig


class MyCustomAgent(BaseAgent):
    def __init__(
        self,
        database_path: str = "dynamic_database.json",
        prover_config: ProverConfig = ProverConfig(),
    ):
        super().__init__(database_path)
        self.prover_config = prover_config

    def _get_build_deps(self) -> bool:
        # Return True to trace dependencies, False to skip
        return False

    def _setup_prover(self):
        # Assign your prover to self.prover
        from my_package import MyProver
        self.prover = MyProver(config=self.prover_config)


# Usage
agent = MyCustomAgent()
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2",
)
agent.prove()

If your custom agent also needs training, assign a trainer to self.trainer in __init__. The inherited train() and evaluate() methods will then work without any additional overrides.

Agents

Trainers

Provers

Data & Database

BaseAgent: Abstract Base for Theorem Proving Agents

Class Definition

Constructor

Abstract Methods

`_get_build_deps`

`_setup_prover`

Public Methods

`setup_github_repository`

`setup_local_repository`

`train`

`evaluate`

`prove`

`prove_theorem`

`initialize_prover`

`add_repository`

`trace_repository`

Writing a Custom Agent

Build docs developers (and LLMs) love

Agents

Trainers

Provers

Data & Database

Documentation Index

​Class Definition

​Constructor

​Abstract Methods

​_get_build_deps

​_setup_prover

​Public Methods

​setup_github_repository

​setup_local_repository

​train

​evaluate

​prove

​prove_theorem

​initialize_prover

​add_repository

​trace_repository

​Writing a Custom Agent

Build docs developers (and LLMs) love

Class Definition

Constructor

Abstract Methods

`_get_build_deps`

`_setup_prover`

Public Methods

`setup_github_repository`

`setup_local_repository`

`train`

`evaluate`

`prove`

`prove_theorem`

`initialize_prover`

`add_repository`

`trace_repository`

Writing a Custom Agent