SFTTrainer: Supervised Fine-Tuning for Lean Provers

SFTTrainer wraps TRL’s SFTTrainer to deliver a zero-boilerplate path from traced Lean repositories to a fine-tuned tactic generator. Given a list of LeanGitRepo objects and a DynamicDatabase, it iterates over repos in curriculum order, exports the accumulated tactic dataset, and runs one or more epochs of supervised fine-tuning — saving the checkpoint after each repository so training is never lost to a failure partway through.

Class

lean_dojo_v2.trainer.sft_trainer.SFTTrainer

Constructor Parameters

model_name

str

required

HuggingFace model identifier or local path to load with AutoModelForCausalLM.from_pretrained. The model is loaded in bfloat16. Example: "deepseek-ai/DeepSeek-Prover-V2-7B".

output_dir

str

default:"outputs"

Directory where model checkpoints and the tokenizer are written after each repository. Created automatically if it does not exist.

epochs_per_repo

int

default:"1"

Number of training epochs to run on the merged dataset after adding each new repository. Passed directly to trl.SFTConfig.num_train_epochs.

batch_size

int

default:"1"

Per-device training batch size (per_device_train_batch_size). Increase when GPU memory allows; combine with gradient accumulation at the TRL config level for larger effective batch sizes.

float

default:"2e-5"

Peak learning rate passed to trl.SFTConfig.learning_rate. The default works well for large pre-trained models such as DeepSeek-Prover-V2.

lora_config

Optional[peft.LoraConfig]

default:"None"

When provided, the model is wrapped with peft.get_peft_model before training begins, enabling parameter-efficient fine-tuning. When None, all model weights are updated (full fine-tuning).

`train` Method

SFTTrainer.train(
    repos: List[LeanGitRepo],
    database: DynamicDatabase,
    data_path: Path,
) -> None

Iterates over repos one at a time, accumulating an ever-growing training set:

Appends the current repo to an internal list and calls database.export_merged_data(repos_so_far, data_path) to materialise the merged split.
Loads <data_path>/random/train.json into an SFTDataset and converts it to a HuggingFace Dataset.
Constructs a trl.SFTTrainer with packing=False and completion_only_loss=True, then calls .train().
Saves the model (or LoRA adapter) to output_dir and the tokenizer after the full loop completes.

train() is called for you by HFAgent.train(). You only need to call it directly when building a custom training loop outside of an agent.

`SFTDataset` — Internal Dataset Class

SFTDataset reads the JSON file produced by DynamicDatabase.export_merged_data and converts every traced tactic into a three-turn chat conversation:

Role	Content
`system`	Fixed instruction: “You are a Lean 4 tactic generator…”
`user`	`state_before` — the proof state presented to the model
`assistant`	`tactic` — the single Lean 4 tactic to predict

sorry tactics are silently skipped. The .to_hf() method returns a HuggingFace Dataset with a single messages column, which TRL’s chat template machinery processes automatically.

Full Example — Minimal SFT Run

This mirrors examples/sft.py exactly:

from lean_dojo_v2.agent.hf_agent import HFAgent
from lean_dojo_v2.trainer.sft_trainer import SFTTrainer

trainer = SFTTrainer(
    model_name="deepseek-ai/DeepSeek-Prover-V2-7B",
    output_dir="outputs-deepseek",
    epochs_per_repo=1,
    batch_size=2,
    lr=2e-5,
)

agent = HFAgent(trainer=trainer)
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="b14fef0ceca29a65bc3122bf730406b33c7effe5",
)
agent.train()
agent.prove()

LoRA Example

Use peft.LoraConfig to cut GPU memory usage while retaining most of the fine-tuning quality:

from peft import LoraConfig
from lean_dojo_v2.agent.hf_agent import HFAgent
from lean_dojo_v2.trainer.sft_trainer import SFTTrainer

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.1,
)

trainer = SFTTrainer(
    model_name="deepseek-ai/DeepSeek-Prover-V2-7B",
    output_dir="outputs-lora",
    epochs_per_repo=1,
    batch_size=2,
    lr=2e-5,
    lora_config=lora_config,
)

agent = HFAgent(trainer=trainer)
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="b14fef0ceca29a65bc3122bf730406b33c7effe5",
)
agent.train()
agent.prove()

When using LoRA, SFTTrainer calls model.save_pretrained(output_dir) to save only the adapter weights. Load the adapter at inference time with PeftModel.from_pretrained(base_model, output_dir), or merge it into the base model with model.merge_and_unload() before passing the path to HFProver.

SFTTrainer loads the full model into GPU memory at construction time (in bfloat16). For 7B-parameter models this requires roughly 14 GB of VRAM. Make sure a CUDA-capable device is available before instantiating the trainer, or use LoRA to reduce the active parameter count.

TRL Configuration Details

Internally, SFTTrainer creates a trl.SFTConfig with the following fixed settings alongside the user-supplied hyperparameters:

`SFTConfig` field	Value
`packing`	`False`
`completion_only_loss`	`True`
`output_dir`	`output_dir` argument
`num_train_epochs`	`epochs_per_repo` argument
`per_device_train_batch_size`	`batch_size` argument
`learning_rate`	`lr` argument

Setting completion_only_loss=True means only the assistant turns contribute to the loss, which prevents the model from being penalised for not reproducing the system prompt or user goal verbatim.

Agents

Trainers

Provers

Data & Database

SFTTrainer: Supervised Fine-Tuning for Lean Provers

Class

Constructor Parameters

`train` Method

`SFTDataset` — Internal Dataset Class

Full Example — Minimal SFT Run

LoRA Example

TRL Configuration Details

Build docs developers (and LLMs) love

Agents

Trainers

Provers

Data & Database

Documentation Index

​Class

​Constructor Parameters

​train Method

​SFTDataset — Internal Dataset Class

​Full Example — Minimal SFT Run

​LoRA Example

​TRL Configuration Details

Build docs developers (and LLMs) love

Class

Constructor Parameters

`train` Method

`SFTDataset` — Internal Dataset Class

Full Example — Minimal SFT Run

LoRA Example

TRL Configuration Details