LeanProgress: Predicting Remaining Proof Steps

LeanProgress is a regression model fine-tuned on proof states that predicts how many tactic steps remain before a proof is complete. By assigning a steps_remaining score to each candidate tactic, LeanProgress transforms open-ended proof search into a guided process — the prover can prioritize paths that are predicted to terminate sooner, reducing wasted compute on dead ends. The model is fine-tuned on top of any HuggingFace encoder (e.g., bert-base-uncased) with a single scalar regression head using MSE loss.

Two-Step Workflow

Generate a training dataset

The create_sample_dataset module produces a JSONL file where each record describes a proof state together with its steps_remaining target. For a quick smoke-test, use the built-in sample generator:

python -m lean_dojo_v2.lean_progress.create_sample_dataset \
    --output raid/data/sample_leanprogress_dataset.jsonl

The command creates the output directory if it does not exist and writes three illustrative examples:

{"goal": "n : ℕ ⊢ gcd n n = n", "prefix": "have : n.gcd n ∣ n := by\n  exact gcd_dvd_right n n", "tactic": "simpa [Nat.gcd_comm] using Nat.gcd_self n", "steps_remaining": 1}
{"goal": "a b : ℝ, ha : a = b ⊢ b = a", "prefix": "", "tactic": "exact ha.symm", "steps_remaining": 0}
{"goal": "G : Group, g h : G ⊢ g * h = h * g", "prefix": "", "tactic": "sorry", "steps_remaining": 12}

Each JSONL record has four fields:

Field	Type	Description
`goal`	`string`	The Lean proof state (hypotheses + goal).
`prefix`	`string`	Optional proof prefix already written before this tactic.
`tactic`	`string`	Candidate tactic being evaluated.
`steps_remaining`	`int ≥ 0`	Target label: `0` means the tactic closes the goal.

The sample generator produces synthetic data sufficient only for verifying the training pipeline. For best model quality, replace or supplement it with real proof traces extracted via DynamicDatabase.export_merged_data(). Real traces carry accurate steps_remaining labels derived from the actual proof tree depth.

Train the regression model

Use ProgressTrainer to fine-tune a HuggingFace encoder with a regression head on the JSONL dataset:

from pathlib import Path
from lean_dojo_v2.trainer.progress_trainer import ProgressTrainer

trainer = ProgressTrainer(
    model_name="bert-base-uncased",
    data_path="raid/data/sample_leanprogress_dataset.jsonl",
    output_dir="outputs-progress",
)
trainer.train()

Or run the examples/lean_progress.py script directly:

python examples/lean_progress.py

When training finishes, ProgressTrainer saves the fine-tuned model and tokenizer to output_dir and prints:

LeanProgress model saved to .../outputs-progress.
Set LEANPROGRESS_MODEL to this path to enable step predictions.

`ProgressTrainer` Details

ProgressTrainer wraps the HuggingFace Trainer API with sensible defaults for the regression task:

from lean_dojo_v2.trainer.progress_trainer import ProgressTrainer

trainer = ProgressTrainer(
    model_name="bert-base-uncased",   # Any HF encoder identifier
    data_path="raid/data/sample_leanprogress_dataset.jsonl",
    output_dir="outputs-progress",
    max_length=512,       # Max token length per example
    batch_size=8,         # Per-device batch size
    epochs=3.0,           # Training epochs
    learning_rate=1e-5,   # AdamW learning rate
    eval_ratio=0.2,       # Fraction of data held out for validation
    seed=42,
)
trainer.train()

Under the hood, ProgressTrainer:

Loads the JSONL file into a HuggingFace Dataset via ProgressDataset.
Splits data into train and eval sets (default 80 / 20).
Tokenizes each example by concatenating goal, prefix, and tactic into a single prompt:
```
Goal:
<goal>

Prefix:
<prefix>

Candidate tactic:
<tactic>
```
Initializes AutoModelForSequenceClassification with num_labels=1 and problem_type="regression" — this adds a single linear head on top of the encoder’s [CLS] representation.
Trains with MSE loss, evaluating on both MSE and MAE after every epoch. The best checkpoint (lowest eval MSE) is loaded at the end.
Saves the model and tokenizer to output_dir.

Supported ProgressDataset formats

ProgressDataset accepts a JSONL file produced by create_sample_dataset.py. Each line must be a JSON object with at minimum goal, tactic, and steps_remaining keys. The prefix key is optional and defaults to an empty string if absent.

{"goal": "⊢ 1 + 1 = 2", "tactic": "norm_num", "steps_remaining": 0}
{"goal": "⊢ n + 0 = n", "prefix": "", "tactic": "simp", "steps_remaining": 0}

Lines that are missing any required key are silently skipped during loading.

Command-Line Training Script

For automated pipelines or hyperparameter sweeps, use train_steps_model.py directly:

python -m lean_dojo_v2.lean_progress.train_steps_model \
    --dataset raid/data/sample_leanprogress_dataset.jsonl \
    --output-dir outputs-progress \
    --model-name bert-base-uncased \
    --epochs 5 \
    --batch-size 16 \
    --learning-rate 2e-5 \
    --max-length 512

Argument	Default	Description
`--dataset`	(required)	Path to the JSONL training file.
`--output-dir`	(required)	Directory to save the fine-tuned model.
`--model-name`	`bert-base-uncased`	HuggingFace encoder model identifier.
`--max-length`	`512`	Maximum token length per example.
`--batch-size`	`8`	Per-device train and eval batch size.
`--epochs`	`3.0`	Number of training epochs.
`--learning-rate`	`1e-5`	AdamW learning rate.
`--eval-ratio`	`0.2`	Fraction of data used for evaluation.
`--seed`	`42`	Random seed for reproducibility.

Using the Trained Model

In custom proof-search loops

After training, load LeanProgressScorer from leanprogress.py to score candidate tactics at inference time:

from lean_dojo_v2.external_api.python.leanprogress import LeanProgressScorer

scorer = LeanProgressScorer(
    model_name="outputs-progress",   # Path to the saved checkpoint
    device="cuda",                    # "cuda", "cpu", or "auto"
)

steps_left = scorer.predict(
    goal="n : ℕ\n⊢ gcd n n = n",
    tactic="exact Nat.gcd_self n",
    prefix=None,
)
print(f"Predicted steps remaining: {steps_left:.2f}")
# → Predicted steps remaining: 0.03

LeanProgressScorer.predict returns a non-negative float. A prediction near 0.0 signals that the tactic is likely to close the goal. Use reward = -steps_remaining to turn this into a maximisation objective compatible with GRPO-style RL training.

Via the External API server

Export LEANPROGRESS_MODEL before starting the server to attach the scorer to every model bundle:

export LEANPROGRESS_MODEL=outputs-progress
export LEANPROGRESS_DEVICE=cuda
cd lean_dojo_v2/external_api/python/
uvicorn server:app --port 23337

Then include "use_reward": true in /generate requests to receive per-tactic steps_remaining and reward values alongside the generated tactics. See the External API guide for full endpoint documentation.

LeanProgressScorer uses a custom PromptTemplate to render inputs. If you fine-tune on a dataset that uses a different input format, pass template as a Python format string with {goal}, {tactic}, and {prefix} placeholders when instantiating LeanProgressScorer — the scorer and trainer will then use the same text format.

Get Started

Concepts

Guides

Reference

LeanProgress: Predicting Remaining Proof Steps

Two-Step Workflow

`ProgressTrainer` Details

Command-Line Training Script

Using the Trained Model

In custom proof-search loops

Via the External API server

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Reference

Documentation Index

​Two-Step Workflow

​ProgressTrainer Details

​Command-Line Training Script

​Using the Trained Model

​In custom proof-search loops

​Via the External API server

Build docs developers (and LLMs) love

Two-Step Workflow

`ProgressTrainer` Details

Command-Line Training Script

Using the Trained Model

In custom proof-search loops

Via the External API server