Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lean-dojo/LeanDojo-v2/llms.txt
Use this file to discover all available pages before exploring further.
SFTTrainer wraps TRL’s SFTTrainer to deliver a zero-boilerplate path from traced Lean repositories to a fine-tuned tactic generator. Given a list of LeanGitRepo objects and a DynamicDatabase, it iterates over repos in curriculum order, exports the accumulated tactic dataset, and runs one or more epochs of supervised fine-tuning — saving the checkpoint after each repository so training is never lost to a failure partway through.
Class
Constructor Parameters
HuggingFace model identifier or local path to load with
AutoModelForCausalLM.from_pretrained. The model is loaded in bfloat16.
Example: "deepseek-ai/DeepSeek-Prover-V2-7B".Directory where model checkpoints and the tokenizer are written after each
repository. Created automatically if it does not exist.
Number of training epochs to run on the merged dataset after adding each new
repository. Passed directly to
trl.SFTConfig.num_train_epochs.Per-device training batch size (
per_device_train_batch_size). Increase when
GPU memory allows; combine with gradient accumulation at the TRL config level
for larger effective batch sizes.Peak learning rate passed to
trl.SFTConfig.learning_rate. The default works
well for large pre-trained models such as DeepSeek-Prover-V2.When provided, the model is wrapped with
peft.get_peft_model before
training begins, enabling parameter-efficient fine-tuning. When None,
all model weights are updated (full fine-tuning).train Method
repos one at a time, accumulating an ever-growing training set:
- Appends the current repo to an internal list and calls
database.export_merged_data(repos_so_far, data_path)to materialise the merged split. - Loads
<data_path>/random/train.jsoninto anSFTDatasetand converts it to a HuggingFaceDataset. - Constructs a
trl.SFTTrainerwithpacking=Falseandcompletion_only_loss=True, then calls.train(). - Saves the model (or LoRA adapter) to
output_dirand the tokenizer after the full loop completes.
train() is called for you by HFAgent.train(). You only need to call it
directly when building a custom training loop outside of an agent.SFTDataset — Internal Dataset Class
SFTDataset reads the JSON file produced by DynamicDatabase.export_merged_data
and converts every traced tactic into a three-turn chat conversation:
| Role | Content |
|---|---|
system | Fixed instruction: “You are a Lean 4 tactic generator…” |
user | state_before — the proof state presented to the model |
assistant | tactic — the single Lean 4 tactic to predict |
sorry tactics are silently skipped. The .to_hf() method returns a
HuggingFace Dataset with a single messages column, which TRL’s chat
template machinery processes automatically.
Full Example — Minimal SFT Run
This mirrorsexamples/sft.py exactly:
LoRA Example
Usepeft.LoraConfig to cut GPU memory usage while retaining most of the
fine-tuning quality:
TRL Configuration Details
Internally,SFTTrainer creates a trl.SFTConfig with the following
fixed settings alongside the user-supplied hyperparameters:
SFTConfig field | Value |
|---|---|
packing | False |
completion_only_loss | True |
output_dir | output_dir argument |
num_train_epochs | epochs_per_repo argument |
per_device_train_batch_size | batch_size argument |
learning_rate | lr argument |
completion_only_loss=True means only the assistant turns contribute
to the loss, which prevents the model from being penalised for not reproducing
the system prompt or user goal verbatim.