Curriculum learning in LeanDojo v2 is the strategy of ordering what a model trains on — starting from mathematically simpler repositories and theorems, then gradually progressing to harder ones. Rather than shuffling all available theorem data together, the pipeline measures the complexity of every theorem, bins them into Easy / Medium / Hard categories, ranks repositories by how many easy theorems they contain, and feeds them to trainers in that order. The approach mirrors how a human student builds intuition on foundational material before tackling advanced topics.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
Difficulty Calculation
Theorem difficulty is computed bycalculate_theorem_difficulty() in lean_dojo_v2/utils/difficulty.py.
| Condition | Score |
|---|---|
Theorem contains sorry (no proof) | float("inf") — always Hard |
| Theorem has zero proof steps | None — distributed evenly across categories later |
| Theorem has S proof steps | math.exp(S) |
e^S accounts for the combinatorial explosion of valid proof paths as proofs grow longer. A one-step proof scores e¹ ≈ 2.7; a ten-step proof scores e¹⁰ ≈ 22026. This ensures the difficulty scale is non-linear and reflects real search complexity.
Difficulty Categories
categorize_difficulty() places each score into a named bucket using the 33rd and 67th percentiles of all finite difficulties across the entire database:
None score (zero proof steps) fall into "To_Distribute" and are split evenly across the three main categories to maintain training balance.
Sorting Repositories
DynamicDatabase.sort_repositories_by_difficulty() computes difficulty for every theorem in the database, categorises them, then sorts repositories by the count of Easy theorems they contain — descending, so the most approachable repository comes first.
- Iterates over all
Repositoryobjects and all their theorems. - Calls
calculate_theorem_difficulty()on each theorem and stores the rating back on theTheorem.difficulty_ratingfield. - Collects all finite difficulty scores and computes
np.percentile(..., [33, 67]). - Assigns every theorem a category, distributing
"To_Distribute"theorems evenly across Easy / Medium / Hard. - Sorts repositories by
len(categorised[repo]["Easy"])descending. - Writes the updated difficulty ratings back to the JSON file.
- Prints a summary table of counts and percentile thresholds.
How Agents Use the Curriculum
BaseAgent.train() calls database.sort_repositories_by_difficulty() before handing the ordered list to the trainer. Each trainer then iterates in order, training for epochs_per_repo epochs on each repository before advancing to the next.
For automated discovery workflows, pass curriculum_learning=True to discover_repositories():
curriculum_learning=True the method calls sort_repositories_by_difficulty() after all repositories have been traced and added, so the returned list is already in difficulty order.
Minimum Theorem Threshold
Repositories with fewer than3 × BATCH_SIZE theorems are rejected during trace_repository():
BATCH_SIZE defaults to 1 in lean_dojo_v2/utils/constants.py, so the minimum is 3 theorems (enough for at least one example in each of train, val, and test). In practice most useful repositories contain hundreds to thousands of theorems.
LeanAgent and Lifelong Learning
LeanAgent (ICLR 2025) is the concrete implementation of the curriculum-driven lifelong learning pipeline. Its key properties directly build on the difficulty framework described above:- ByT5-based retriever — trained progressively, one epoch per repository, in curriculum order.
- Best-first tree search — driven by cumulative log-probability, with a 10-minute per-theorem timeout.
- Ray distributed compute — parallel proof search across a cluster.
- PyTorch Lightning DDP — training on 4× A100 GPUs with the Distributed Data Parallel strategy.
- Checkpoint selection — based on Recall@10 (R@10) on the current repository’s validation set.
- Plasticity / stability evaluation — after each repository, LeanAgent reports performance on both the current repository (plasticity) and the average over all previously seen repositories (stability).
Elastic Weight Consolidation (Optional)
EWC is an ablation technique for preventing catastrophic forgetting. It is not the default configuration and performed sub-optimally in the paper, but it is supported as an option:- After training on a repository,
compute_fisher.pycomputes the Fisher Information Matrix (FIM) by squaring and averaging gradients of the model’s log-likelihood across training examples. - During the next training phase, the FIM is loaded and a regularisation term penalises large updates to parameters that were important for previous tasks.
- A starting FIM computed on Mathlib4 is available for download via the setup instructions in
lean_dojo_v2/lean_agent/README.md.
EWC adds significant overhead (one extra forward-backward pass per training batch for FIM computation, plus checkpoint storage). Unless you are reproducing the ablation results from the LeanAgent paper, the default progressive training approach without EWC is recommended.