Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt

Use this file to discover all available pages before exploring further.

Supervised Fine-Tuning (SFT) with adapter methods lets you train only a small fraction of parameters while keeping the base model frozen. This module provides 25 ready-to-run pipelines — five adapter techniques applied to five question answering datasets — all using SFTTrainer from TRL with apply_chat_template formatting on meta-llama/Llama-3.2-3B.

Techniques

TechniqueDescriptionKey parameter
LoRAInjects trainable low-rank matrices into attention and feed-forward projection layerslora_r=8, lora_alpha=32
QLoRALoRA on a 4-bit NF4-quantized base model via BitsAndBytes; reduces VRAM by ~50%bnb_4bit_quant_type="nf4"
DoRADecomposes weight updates into magnitude and direction; applies LoRA to the directional componentuse_dora=True
P-TuningTrains a small MLP encoder that generates continuous soft-prompt embeddings prepended to every inputnum_virtual_tokens=20
Prefix-TuningPrepends trainable prefix vectors to the key and value tensors of every attention layernum_virtual_tokens=20

Datasets

DatasetDomainHuggingFace ID
ARCScience QAallenai/ai2_arc
TriviaQAOpen-domain QAmandarjoshi/trivia_qa
FactScoreFactual QAawinml/factscore_unlabelled_alpaca_13b_retrieval
PopQAEntity QAakariasai/PopQA
Earnings CallsFinancial QAlamini/earnings-calls-qa

All 25 pipelines

All paths are relative to src/llm_finetuning/supervised_finetuning/.
DatasetLoRAQLoRADoRAP-TuningPrefix-Tuning
ARClora/arc/train.pyqlora/arc/train.pydora/arc/train.pyp_tuning/arc/train.pyprefix_tuning/arc/train.py
Earnings Callslora/earnings_call/train.pyqlora/earnings_call/train.pydora/earnings_call/train.pyp_tuning/earnings_call/train.pyprefix_tuning/earnings_call/train.py
FactScorelora/factscore/train.pyqlora/factscore/train.pydora/factscore/train.pyp_tuning/factscore/train.pyprefix_tuning/factscore/train.py
PopQAlora/popqa/train.pyqlora/popqa/train.pydora/popqa/train.pyp_tuning/popqa/train.pyprefix_tuning/popqa/train.py
TriviaQAlora/triviaqa/train.pyqlora/triviaqa/train.pydora/triviaqa/train.pyp_tuning/triviaqa/train.pyprefix_tuning/triviaqa/train.py

Running a pipeline

Each pipeline is self-contained — pick a technique and dataset combination, then run the corresponding script from the repository root.
python src/llm_finetuning/supervised_finetuning/lora/arc/train.py
python src/llm_finetuning/supervised_finetuning/lora/triviaqa/train.py
python src/llm_finetuning/supervised_finetuning/lora/factscore/train.py
python src/llm_finetuning/supervised_finetuning/lora/popqa/train.py
python src/llm_finetuning/supervised_finetuning/lora/earnings_call/train.py
Each script reads its co-located config.yaml, downloads the dataset, wraps the base model with the adapter, trains, and saves the adapter weights plus tokenizer to output_dir.

LoRA configuration

The following parameters from lora/arc/config.yaml control the adapter size and which layers are targeted. The same defaults apply to QLoRA and DoRA.
lora_r: 8
lora_alpha: 32
lora_dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
Use QLoRA to cut VRAM requirements by approximately 50% compared to LoRA. The base model is quantized to 4-bit NF4 format via BitsAndBytes; the LoRA adapters themselves are still trained in bfloat16. This makes 3B-parameter models trainable on 6–8 GB of GPU memory.

Training script pattern

All five techniques share the same high-level structure. Here is the LoRA variant from lora/arc/train.py:
from peft import LoraConfig, TaskType, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer

# Adapter configuration
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=config["lora_r"],
    lora_alpha=config["lora_alpha"],
    lora_dropout=config["lora_dropout"],
    bias="none",
    target_modules=config["target_modules"],
)
model = get_peft_model(model, peft_config)

# Training
training_args = SFTConfig(
    output_dir=config["output_dir"],
    num_train_epochs=config["num_train_epochs"],
    per_device_train_batch_size=config["per_device_train_batch_size"],
    gradient_accumulation_steps=config["gradient_accumulation_steps"],
    learning_rate=config["learning_rate"],
    bf16=True,
    dataset_text_field="text",
    packing=False,
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=tokenizer,
    args=training_args,
)

trainer.train()
model.save_pretrained(config["output_dir"])
tokenizer.save_pretrained(config["output_dir"])
For QLoRA, BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") is passed to AutoModelForCausalLM.from_pretrained, and peft_config is passed directly to SFTTrainer rather than calling get_peft_model manually.

GPU memory guidance

TechniqueTypical VRAM
LoRA (3B model)8–12 GB
QLoRA (3B model)6–8 GB
Reduce per_device_train_batch_size to 1 and increase gradient_accumulation_steps if you run out of memory.

Overriding the dataset

To use a different dataset split or HuggingFace ID, edit config.yaml in the pipeline directory:
dataset_id: "allenai/ai2_arc"
dataset_subset: "ARC-Challenge"
split: "train"
All three keys are optional. Omitting them falls back to the loader’s class-level defaults.

Build docs developers (and LLMs) love