Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

Curriculum learning in LeanDojo v2 is the strategy of ordering what a model trains on — starting from mathematically simpler repositories and theorems, then gradually progressing to harder ones. Rather than shuffling all available theorem data together, the pipeline measures the complexity of every theorem, bins them into Easy / Medium / Hard categories, ranks repositories by how many easy theorems they contain, and feeds them to trainers in that order. The approach mirrors how a human student builds intuition on foundational material before tackling advanced topics.
Curriculum learning is most impactful for LeanAgent, which trains a retrieval encoder progressively across many repositories. Starting on simpler theorems gives the retriever a stable signal before it encounters long, multi-step proofs where the premise space is large and noisy.

Difficulty Calculation

Theorem difficulty is computed by calculate_theorem_difficulty() in lean_dojo_v2/utils/difficulty.py.
from lean_dojo_v2.utils.difficulty import calculate_theorem_difficulty

score = calculate_theorem_difficulty(theorem)
The scoring rules are:
ConditionScore
Theorem contains sorry (no proof)float("inf") — always Hard
Theorem has zero proof stepsNone — distributed evenly across categories later
Theorem has S proof stepsmath.exp(S)
The exponential function e^S accounts for the combinatorial explosion of valid proof paths as proofs grow longer. A one-step proof scores e¹ ≈ 2.7; a ten-step proof scores e¹⁰ ≈ 22026. This ensures the difficulty scale is non-linear and reflects real search complexity.

Difficulty Categories

categorize_difficulty() places each score into a named bucket using the 33rd and 67th percentiles of all finite difficulties across the entire database:
from lean_dojo_v2.utils.difficulty import categorize_difficulty
import numpy as np

percentiles = np.percentile(all_finite_scores, [33, 67])
category = categorize_difficulty(score, percentiles)
# → "Easy", "Medium", "Hard", "Hard (No proof)", or "To_Distribute"
Theorems with a None score (zero proof steps) fall into "To_Distribute" and are split evenly across the three main categories to maintain training balance.

Sorting Repositories

DynamicDatabase.sort_repositories_by_difficulty() computes difficulty for every theorem in the database, categorises them, then sorts repositories by the count of Easy theorems they contain — descending, so the most approachable repository comes first.
sorted_repos = db.sort_repositories_by_difficulty()
# Returns: List[Repository] ordered easy → hard
Internally the method:
  1. Iterates over all Repository objects and all their theorems.
  2. Calls calculate_theorem_difficulty() on each theorem and stores the rating back on the Theorem.difficulty_rating field.
  3. Collects all finite difficulty scores and computes np.percentile(..., [33, 67]).
  4. Assigns every theorem a category, distributing "To_Distribute" theorems evenly across Easy / Medium / Hard.
  5. Sorts repositories by len(categorised[repo]["Easy"]) descending.
  6. Writes the updated difficulty ratings back to the JSON file.
  7. Prints a summary table of counts and percentile thresholds.

How Agents Use the Curriculum

BaseAgent.train() calls database.sort_repositories_by_difficulty() before handing the ordered list to the trainer. Each trainer then iterates in order, training for epochs_per_repo epochs on each repository before advancing to the next. For automated discovery workflows, pass curriculum_learning=True to discover_repositories():
lean_git_repos = db.discover_repositories(
    num_repos=20,
    curriculum_learning=True,   # sorts after tracing
    build_deps=False,
)
When curriculum_learning=True the method calls sort_repositories_by_difficulty() after all repositories have been traced and added, so the returned list is already in difficulty order.

Minimum Theorem Threshold

Repositories with fewer than 3 × BATCH_SIZE theorems are rejected during trace_repository():
if total_theorems < 3 * BATCH_SIZE:
    logger.info(f"Not enough theorems found in {repo.url}")
    return None
BATCH_SIZE defaults to 1 in lean_dojo_v2/utils/constants.py, so the minimum is 3 theorems (enough for at least one example in each of train, val, and test). In practice most useful repositories contain hundreds to thousands of theorems.

LeanAgent and Lifelong Learning

LeanAgent (ICLR 2025) is the concrete implementation of the curriculum-driven lifelong learning pipeline. Its key properties directly build on the difficulty framework described above:
  • ByT5-based retriever — trained progressively, one epoch per repository, in curriculum order.
  • Best-first tree search — driven by cumulative log-probability, with a 10-minute per-theorem timeout.
  • Ray distributed compute — parallel proof search across a cluster.
  • PyTorch Lightning DDP — training on 4× A100 GPUs with the Distributed Data Parallel strategy.
  • Checkpoint selection — based on Recall@10 (R@10) on the current repository’s validation set.
  • Plasticity / stability evaluation — after each repository, LeanAgent reports performance on both the current repository (plasticity) and the average over all previously seen repositories (stability).

Elastic Weight Consolidation (Optional)

EWC is an ablation technique for preventing catastrophic forgetting. It is not the default configuration and performed sub-optimally in the paper, but it is supported as an option:
  1. After training on a repository, compute_fisher.py computes the Fisher Information Matrix (FIM) by squaring and averaging gradients of the model’s log-likelihood across training examples.
  2. During the next training phase, the FIM is loaded and a regularisation term penalises large updates to parameters that were important for previous tasks.
  3. A starting FIM computed on Mathlib4 is available for download via the setup instructions in lean_dojo_v2/lean_agent/README.md.
EWC adds significant overhead (one extra forward-backward pass per training batch for FIM computation, plus checkpoint storage). Unless you are reproducing the ablation results from the LeanAgent paper, the default progressive training approach without EWC is recommended.

Example: Full Curriculum Workflow

from pathlib import Path
from lean_dojo_v2.database import DynamicDatabase

db = DynamicDatabase(json_path="raid/data/dynamic_database.json")

# Trace several repositories
urls_and_commits = [
    ("https://github.com/durant42040/lean4-example",
     "3e23ab0bfdcfdbd5b11ab53c2cd8b5d16492e9c2"),
    ("https://github.com/my-org/lean-algebra",
     "abc123def456..."),
]
for url, commit in urls_and_commits:
    repo = db.trace_repository(url=url, commit=commit, build_deps=True)
    if repo:
        db.add_repository(repo)

# Sort by difficulty and inspect the order
sorted_repos = db.sort_repositories_by_difficulty()
for r in sorted_repos:
    print(r.url, "→", r.total_theorems, "theorems")

# Export a merged curriculum dataset
db.export_merged_data(sorted_repos, output_path=Path("raid/data/curriculum_merged"))

Citation

LeanAgent was presented at ICLR 2025. If you use the curriculum learning or lifelong learning components in your research, please cite:
@inproceedings{kumarappan2025leanagent,
  title={{LeanAgent}: Lifelong Learning for Formal Theorem Proving},
  author={Kumarappan, Adarsh and Tiwari, Mo and Song, Peiyang and George, Robert Joseph
          and Xiao, Chaowei and Anandkumar, Anima},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

Build docs developers (and LLMs) love