Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt

Use this file to discover all available pages before exploring further.

LLM Fine-tuning provides ready-to-run training pipelines for adapting large language models to specialized tasks. It covers three paradigms — Supervised Fine-Tuning (SFT) with adapter methods, Reinforcement Learning with Group Relative Policy Optimization (GRPO), and Preference Alignment — across 16 datasets in math reasoning, multi-hop QA, medical QA, and general question answering.

Introduction

Learn the project structure, supported techniques, and which pipeline to use for your task.

Quickstart

Install dependencies and run your first fine-tuning pipeline in minutes.

Training Paradigms

Explore SFT, GRPO, and Preference Alignment pipelines with real command examples.

Config Reference

Full reference for all YAML configuration fields across every pipeline.

What’s included

Supervised Fine-Tuning

25 pipelines across LoRA, QLoRA, DoRA, P-Tuning, and Prefix-Tuning on 5 QA datasets.

GRPO Math Reasoning

GRPO on GSM8K with 5 reward functions, plus a two-stage SFT + GRPO pipeline for Qwen3.

Multi-Hop QA

GRPO on HotpotQA, FreshQA, and MuSiQue with 8 reward functions.

Medical QA

GRPO on MedQA, BioASQ, and PubMedQA with LLM-as-a-Judge evaluation.

Preference Alignment

DPO, ORPO, KTO, and PPO pipelines for aligning models with human preferences.

Reward Functions

Composable reward functions for correctness and format, backed by DeepEval and Evidently AI.

Built on industry-standard tooling

LLM Fine-tuning is built on HuggingFace TRL, PEFT, and Unsloth for training, with reward evaluation powered by DeepEval and Evidently AI. Every pipeline follows the same pattern: a config.yaml for hyperparameters, a dataset loader, and a train.py script you run directly.

Build docs developers (and LLMs) love