Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt

Use this file to discover all available pages before exploring further.

This quickstart walks you through installing the project with uv, setting up HuggingFace credentials, and executing your first training pipeline. By the end you will have a working fine-tuned adapter saved to ./outputs/. All 39 pipelines follow the same pattern, so once you run one, the rest are identical in structure.
1

Install uv and clone the repository

The project uses uv for fast, reproducible dependency management. Install it with pip if you do not already have it, then clone the repository.
pip install uv
git clone https://github.com/avnlp/llm-finetuning
cd llm-finetuning
2

Install dependencies

uv sync creates a virtual environment and installs all runtime dependencies — transformers, trl, peft, unsloth, deepeval, evidently, and more — in one step.
uv sync
source .venv/bin/activate
For linting and type-checking tools (Ruff, MyPy, Bandit), include the dev group:
uv sync --dev
source .venv/bin/activate
3

Set up credentials

All pipelines download model weights from HuggingFace. Gated models (Llama, Mistral, Gemma) require you to be logged in.
# Required for gated HuggingFace models (Llama, Mistral, Gemma, etc.)
huggingface-cli login
An OPENAI_API_KEY is required only for multi_hop_question_answering and medical_question_answering pipelines, which use DeepEval and Evidently LLM-as-a-Judge reward functions. SFT, GRPO math reasoning, and preference alignment pipelines do not need it.
For pipelines that do require it, export the key before running:
# Required only for multi_hop_question_answering and medical_question_answering
export OPENAI_API_KEY="your-key"
4

Run your first pipeline

Each pipeline is a single Python script that reads its config.yaml, downloads the dataset, trains the model, and writes adapter weights to ./outputs/. Pick one of the options below.
The recommended starting point. Fine-tunes Llama-3.2-3B with LoRA (rank 8, alpha 32) on the ARC-Challenge science QA dataset. Requires 8–12 GB VRAM.
python src/llm_finetuning/supervised_finetuning/lora/arc/train.py
The script reads src/llm_finetuning/supervised_finetuning/lora/arc/config.yaml:
model_id: "meta-llama/Llama-3.2-3B"
dataset_name: "allenai/ai2_arc"
dataset_config: "ARC-Challenge"
split: "train"
output_dir: "./outputs/supervised_finetuning/lora/arc"

num_train_epochs: 3
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 2.0e-4
save_strategy: "epoch"
logging_steps: 10

lora_r: 8
lora_alpha: 32
lora_dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj
Adapter weights are saved to ./outputs/supervised_finetuning/lora/arc after training.

Overriding config.yaml settings

You can change the dataset, model, or any hyperparameter by editing the pipeline’s config.yaml — no code changes required.

Changing the dataset

For supervised_finetuning/ pipelines (uses the split key):
dataset_id: "allenai/ai2_arc"
dataset_subset: "ARC-Challenge"
split: "train"
For GRPO, math reasoning, medical QA, and preference alignment pipelines (uses the dataset_split key):
dataset_id: "openai/gsm8k"
dataset_subset: "main"
dataset_split: "train"
All three keys are optional. Omitting them falls back to each loader’s built-in defaults.

Changing the base model

Open the pipeline’s config.yaml and update model_id:
# Before
model_id: "meta-llama/Llama-3.2-3B"

# After
model_id: "meta-llama/Llama-3.1-8B"
For GRPO and preference alignment pipelines, use Unsloth-quantized variants to reduce VRAM:
model_id: "unsloth/Llama-3.1-8B-Instruct"

Output location

All pipelines write adapter weights and the tokenizer to ./outputs/<module>/<method>/<dataset>/ by default. For example:
  • SFT LoRA on ARC → ./outputs/supervised_finetuning/lora/arc/
  • GRPO on GSM8K → ./outputs/math_reasoning/grpo/gsm8k/
  • DPO on UltraFeedback → ./outputs/preference_alignment/dpo/ultrafeedback/
Override the location by setting output_dir in config.yaml:
output_dir: "./my_custom_output_dir"

GPU memory guidance

Choose a pipeline that fits your available VRAM. Lower-memory techniques like QLoRA are available for most training paradigms.
TechniqueTypical VRAM
SFT LoRA (3B model)8–12 GB
SFT QLoRA (3B model)6–8 GB
GRPO QLoRA (3B model)12–16 GB
DPO QLoRA (7B model)16–24 GB
ORPO QLoRA (7B model)16–24 GB
KTO QLoRA (1.5B model)8–12 GB
PPO (8B model)40+ GB
If you run out of memory, set per_device_train_batch_size: 1 and increase gradient_accumulation_steps to maintain effective batch size.

Pipelines that require OPENAI_API_KEY

The following pipeline groups use DeepEval or Evidently AI LLM-as-a-Judge reward functions, which call the OpenAI API during training. Set OPENAI_API_KEY before running any of these.
Running multi-hop QA or medical QA pipelines without OPENAI_API_KEY will raise an authentication error at the first reward function call.
Multi-hop question answering:
export OPENAI_API_KEY="your-key"
python src/llm_finetuning/multi_hop_question_answering/grpo/hotpotqa/train.py
python src/llm_finetuning/multi_hop_question_answering/grpo/freshqa/train.py
python src/llm_finetuning/multi_hop_question_answering/grpo/musique/train.py
Medical question answering:
export OPENAI_API_KEY="your-key"
python src/llm_finetuning/medical_question_answering/medqa/train.py
python src/llm_finetuning/medical_question_answering/bioasq/train.py
python src/llm_finetuning/medical_question_answering/pubmedqa/train.py
All SFT, GRPO math reasoning, and preference alignment pipelines run without an OpenAI key.

Build docs developers (and LLMs) love