LLM Fine-tuning: Pipelines for SFT and RLHF

Auto-generate your docs

What’s included
Built on industry-standard tooling

LLM Fine-tuning provides ready-to-run training pipelines for adapting large language models to specialized tasks. It covers three paradigms — Supervised Fine-Tuning (SFT) with adapter methods, Reinforcement Learning with Group Relative Policy Optimization (GRPO), and Preference Alignment — across 16 datasets in math reasoning, multi-hop QA, medical QA, and general question answering.

Introduction

Learn the project structure, supported techniques, and which pipeline to use for your task.

Quickstart

Install dependencies and run your first fine-tuning pipeline in minutes.

Training Paradigms

Explore SFT, GRPO, and Preference Alignment pipelines with real command examples.

Config Reference

Full reference for all YAML configuration fields across every pipeline.

What’s included

Supervised Fine-Tuning

25 pipelines across LoRA, QLoRA, DoRA, P-Tuning, and Prefix-Tuning on 5 QA datasets.

GRPO Math Reasoning

GRPO on GSM8K with 5 reward functions, plus a two-stage SFT + GRPO pipeline for Qwen3.

Multi-Hop QA

GRPO on HotpotQA, FreshQA, and MuSiQue with 8 reward functions.

Medical QA

GRPO on MedQA, BioASQ, and PubMedQA with LLM-as-a-Judge evaluation.

Preference Alignment

DPO, ORPO, KTO, and PPO pipelines for aligning models with human preferences.

Reward Functions

Composable reward functions for correctness and format, backed by DeepEval and Evidently AI.

Built on industry-standard tooling

LLM Fine-tuning is built on HuggingFace TRL, PEFT, and Unsloth for training, with reward evaluation powered by DeepEval and Evidently AI. Every pipeline follows the same pattern: a config.yaml for hyperparameters, a dataset loader, and a train.py script you run directly.

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Training Paradigms

Core Concepts

Reference

LLM Fine-tuning: Pipelines for SFT and RLHF

Introduction

Quickstart

Training Paradigms

Config Reference

What’s included

Supervised Fine-Tuning

GRPO Math Reasoning

Multi-Hop QA

Medical QA

Preference Alignment

Reward Functions

Built on industry-standard tooling

Build docs developers (and LLMs) love

Get Started

Training Paradigms

Core Concepts

Reference

Documentation Index

Introduction

Quickstart

Training Paradigms

Config Reference

​What’s included

Supervised Fine-Tuning

GRPO Math Reasoning

Multi-Hop QA

Medical QA

Preference Alignment

Reward Functions

​Built on industry-standard tooling

Build docs developers (and LLMs) love

What’s included

Built on industry-standard tooling