This quickstart walks you through installing the project withDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt
Use this file to discover all available pages before exploring further.
uv, setting up HuggingFace credentials, and executing your first training pipeline. By the end you will have a working fine-tuned adapter saved to ./outputs/. All 39 pipelines follow the same pattern, so once you run one, the rest are identical in structure.
Install uv and clone the repository
The project uses uv for fast, reproducible dependency management. Install it with
pip if you do not already have it, then clone the repository.Install dependencies
uv sync creates a virtual environment and installs all runtime dependencies — transformers, trl, peft, unsloth, deepeval, evidently, and more — in one step.Set up credentials
All pipelines download model weights from HuggingFace. Gated models (Llama, Mistral, Gemma) require you to be logged in.For pipelines that do require it, export the key before running:
An
OPENAI_API_KEY is required only for multi_hop_question_answering and medical_question_answering pipelines, which use DeepEval and Evidently LLM-as-a-Judge reward functions. SFT, GRPO math reasoning, and preference alignment pipelines do not need it.Run your first pipeline
Each pipeline is a single Python script that reads its
config.yaml, downloads the dataset, trains the model, and writes adapter weights to ./outputs/. Pick one of the options below.- SFT — LoRA on ARC
- GRPO — GSM8K
- Preference alignment — DPO
The recommended starting point. Fine-tunes Llama-3.2-3B with LoRA (rank 8, alpha 32) on the ARC-Challenge science QA dataset. Requires 8–12 GB VRAM.The script reads Adapter weights are saved to
src/llm_finetuning/supervised_finetuning/lora/arc/config.yaml:./outputs/supervised_finetuning/lora/arc after training.Overriding config.yaml settings
You can change the dataset, model, or any hyperparameter by editing the pipeline’sconfig.yaml — no code changes required.
Changing the dataset
Forsupervised_finetuning/ pipelines (uses the split key):
dataset_split key):
Changing the base model
Open the pipeline’sconfig.yaml and update model_id:
Output location
All pipelines write adapter weights and the tokenizer to./outputs/<module>/<method>/<dataset>/ by default. For example:
- SFT LoRA on ARC →
./outputs/supervised_finetuning/lora/arc/ - GRPO on GSM8K →
./outputs/math_reasoning/grpo/gsm8k/ - DPO on UltraFeedback →
./outputs/preference_alignment/dpo/ultrafeedback/
output_dir in config.yaml:
GPU memory guidance
Choose a pipeline that fits your available VRAM. Lower-memory techniques like QLoRA are available for most training paradigms.| Technique | Typical VRAM |
|---|---|
| SFT LoRA (3B model) | 8–12 GB |
| SFT QLoRA (3B model) | 6–8 GB |
| GRPO QLoRA (3B model) | 12–16 GB |
| DPO QLoRA (7B model) | 16–24 GB |
| ORPO QLoRA (7B model) | 16–24 GB |
| KTO QLoRA (1.5B model) | 8–12 GB |
| PPO (8B model) | 40+ GB |
Pipelines that require OPENAI_API_KEY
The following pipeline groups use DeepEval or Evidently AI LLM-as-a-Judge reward functions, which call the OpenAI API during training. SetOPENAI_API_KEY before running any of these.
Multi-hop question answering: