Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt

Use this file to discover all available pages before exploring further.

SFTTrainer wraps TRL’s SFTTrainer to deliver a zero-boilerplate path from traced Lean repositories to a fine-tuned tactic generator. Given a list of LeanGitRepo objects and a DynamicDatabase, it iterates over repos in curriculum order, exports the accumulated tactic dataset, and runs one or more epochs of supervised fine-tuning — saving the checkpoint after each repository so training is never lost to a failure partway through.

Class

lean_dojo_v2.trainer.sft_trainer.SFTTrainer

Constructor Parameters

model_name
str
required
HuggingFace model identifier or local path to load with AutoModelForCausalLM.from_pretrained. The model is loaded in bfloat16. Example: "deepseek-ai/DeepSeek-Prover-V2-7B".
output_dir
str
default:"outputs"
Directory where model checkpoints and the tokenizer are written after each repository. Created automatically if it does not exist.
epochs_per_repo
int
default:"1"
Number of training epochs to run on the merged dataset after adding each new repository. Passed directly to trl.SFTConfig.num_train_epochs.
batch_size
int
default:"1"
Per-device training batch size (per_device_train_batch_size). Increase when GPU memory allows; combine with gradient accumulation at the TRL config level for larger effective batch sizes.
lr
float
default:"2e-5"
Peak learning rate passed to trl.SFTConfig.learning_rate. The default works well for large pre-trained models such as DeepSeek-Prover-V2.
lora_config
Optional[peft.LoraConfig]
default:"None"
When provided, the model is wrapped with peft.get_peft_model before training begins, enabling parameter-efficient fine-tuning. When None, all model weights are updated (full fine-tuning).

train Method

SFTTrainer.train(
    repos: List[LeanGitRepo],
    database: DynamicDatabase,
    data_path: Path,
) -> None
Iterates over repos one at a time, accumulating an ever-growing training set:
  1. Appends the current repo to an internal list and calls database.export_merged_data(repos_so_far, data_path) to materialise the merged split.
  2. Loads <data_path>/random/train.json into an SFTDataset and converts it to a HuggingFace Dataset.
  3. Constructs a trl.SFTTrainer with packing=False and completion_only_loss=True, then calls .train().
  4. Saves the model (or LoRA adapter) to output_dir and the tokenizer after the full loop completes.
train() is called for you by HFAgent.train(). You only need to call it directly when building a custom training loop outside of an agent.

SFTDataset — Internal Dataset Class

SFTDataset reads the JSON file produced by DynamicDatabase.export_merged_data and converts every traced tactic into a three-turn chat conversation:
RoleContent
systemFixed instruction: “You are a Lean 4 tactic generator…”
userstate_before — the proof state presented to the model
assistanttactic — the single Lean 4 tactic to predict
sorry tactics are silently skipped. The .to_hf() method returns a HuggingFace Dataset with a single messages column, which TRL’s chat template machinery processes automatically.

Full Example — Minimal SFT Run

This mirrors examples/sft.py exactly:
from lean_dojo_v2.agent.hf_agent import HFAgent
from lean_dojo_v2.trainer.sft_trainer import SFTTrainer

trainer = SFTTrainer(
    model_name="deepseek-ai/DeepSeek-Prover-V2-7B",
    output_dir="outputs-deepseek",
    epochs_per_repo=1,
    batch_size=2,
    lr=2e-5,
)

agent = HFAgent(trainer=trainer)
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="b14fef0ceca29a65bc3122bf730406b33c7effe5",
)
agent.train()
agent.prove()

LoRA Example

Use peft.LoraConfig to cut GPU memory usage while retaining most of the fine-tuning quality:
from peft import LoraConfig
from lean_dojo_v2.agent.hf_agent import HFAgent
from lean_dojo_v2.trainer.sft_trainer import SFTTrainer

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.1,
)

trainer = SFTTrainer(
    model_name="deepseek-ai/DeepSeek-Prover-V2-7B",
    output_dir="outputs-lora",
    epochs_per_repo=1,
    batch_size=2,
    lr=2e-5,
    lora_config=lora_config,
)

agent = HFAgent(trainer=trainer)
agent.setup_github_repository(
    url="https://github.com/durant42040/lean4-example",
    commit="b14fef0ceca29a65bc3122bf730406b33c7effe5",
)
agent.train()
agent.prove()
When using LoRA, SFTTrainer calls model.save_pretrained(output_dir) to save only the adapter weights. Load the adapter at inference time with PeftModel.from_pretrained(base_model, output_dir), or merge it into the base model with model.merge_and_unload() before passing the path to HFProver.
SFTTrainer loads the full model into GPU memory at construction time (in bfloat16). For 7B-parameter models this requires roughly 14 GB of VRAM. Make sure a CUDA-capable device is available before instantiating the trainer, or use LoRA to reduce the active parameter count.

TRL Configuration Details

Internally, SFTTrainer creates a trl.SFTConfig with the following fixed settings alongside the user-supplied hyperparameters:
SFTConfig fieldValue
packingFalse
completion_only_lossTrue
output_diroutput_dir argument
num_train_epochsepochs_per_repo argument
per_device_train_batch_sizebatch_size argument
learning_ratelr argument
Setting completion_only_loss=True means only the assistant turns contribute to the loss, which prevents the model from being penalised for not reproducing the system prompt or user goal verbatim.

Build docs developers (and LLMs) love