LLM Fine-tuning provides ready-to-run training pipelines for adapting large language models to specialized tasks. It covers three paradigms — Supervised Fine-Tuning (SFT) with adapter methods, Reinforcement Learning with Group Relative Policy Optimization (GRPO), and Preference Alignment — across 16 datasets in math reasoning, multi-hop QA, medical QA, and general question answering.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Learn the project structure, supported techniques, and which pipeline to use for your task.
Quickstart
Install dependencies and run your first fine-tuning pipeline in minutes.
Training Paradigms
Explore SFT, GRPO, and Preference Alignment pipelines with real command examples.
Config Reference
Full reference for all YAML configuration fields across every pipeline.
What’s included
Supervised Fine-Tuning
25 pipelines across LoRA, QLoRA, DoRA, P-Tuning, and Prefix-Tuning on 5 QA datasets.
GRPO Math Reasoning
GRPO on GSM8K with 5 reward functions, plus a two-stage SFT + GRPO pipeline for Qwen3.
Multi-Hop QA
GRPO on HotpotQA, FreshQA, and MuSiQue with 8 reward functions.
Medical QA
GRPO on MedQA, BioASQ, and PubMedQA with LLM-as-a-Judge evaluation.
Preference Alignment
DPO, ORPO, KTO, and PPO pipelines for aligning models with human preferences.
Reward Functions
Composable reward functions for correctness and format, backed by DeepEval and Evidently AI.
Built on industry-standard tooling
LLM Fine-tuning is built on HuggingFace TRL, PEFT, and Unsloth for training, with reward evaluation powered by DeepEval and Evidently AI. Every pipeline follows the same pattern: aconfig.yaml for hyperparameters, a dataset loader, and a train.py script you run directly.