Supervised fine-tuning with TRL

This tutorial shows you how to perform supervised fine-tuning (SFT) using the Hugging Face TRL (Transformer Reinforcement Learning) library. We’ll fine-tune the LFM2.5-1.2B-Instruct model on the SmolTalk dataset.

What you’ll learn

By the end of this tutorial, you’ll know how to:

Set up your environment with TRL and Hugging Face libraries
Load and prepare datasets for instruction fine-tuning
Use TRL’s SFTTrainer for supervised fine-tuning
Configure LoRA for parameter-efficient fine-tuning on constrained hardware
Test your model with local inference
Save and export models for deployment

Prerequisites

GPU: This tutorial requires a GPU. You can run it for free on Google Colab using an NVIDIA T4 GPU
Python: Python 3.8 or higher
Basic knowledge: Familiarity with Hugging Face Transformers library

Why use TRL?

TRL (Transformer Reinforcement Learning) is a full-stack library from Hugging Face that provides:

SFTTrainer: Simplified supervised fine-tuning with best practices built-in
Deep integration with Hugging Face ecosystem (datasets, models, hub)
LoRA support through PEFT integration for memory-efficient training
Flexible configuration for various training scenarios
Active development and community support

Tutorial overview

The tutorial covers the following steps:

Installation: Install TRL, Transformers, and PEFT libraries
Model loading: Load the LFM2.5-1.2B-Instruct model from Hugging Face Hub
Dataset preparation: Format the SmolTalk dataset for instruction tuning
Training configuration: Set up SFTTrainer with LoRA configuration
Training: Fine-tune the model on your dataset
Inference: Test the fine-tuned model
Model export: Save your model for deployment

Key concepts

TRL’s SFTTrainer

SFTTrainer simplifies supervised fine-tuning by:

Automatically handling chat templates and formatting
Providing sensible default hyperparameters
Integrating with PEFT for LoRA training
Supporting mixed precision training out of the box
Including logging and checkpointing utilities

Standard vs parameter-efficient fine-tuning

The tutorial demonstrates both approaches: Standard fine-tuning:

Updates all model parameters
Requires more GPU memory
Can achieve slightly better performance
Best for larger GPUs (24GB+)

LoRA fine-tuning:

Only trains additional low-rank matrices
Reduces memory requirements by 3-4x
Enables fine-tuning on consumer GPUs
Maintains most of the performance
Best for GPUs with 8-16GB memory

Deployment options

After fine-tuning, you can deploy your model to:

Mobile: Android and iOS apps using the LEAP SDK
Desktop: Mac (MLX), Windows/Linux (llama.cpp, Ollama, LM Studio)
Cloud: vLLM, Modal, Baseten, Fal for production deployments
Edge: On-device inference for low-latency applications

See the deployment documentation for detailed guides.

Run the tutorial

You can run this tutorial in two ways:

Google Colab (recommended for beginners): Click the “Open in Colab” badge at the top
Local environment: Clone the LFM Cookbook repository and run the notebook locally

Access the notebook

The complete notebook is available at:

GitHub: sft_with_trl.ipynb
Colab: Click the badge above to open directly in Google Colab

Comparison with Unsloth

If you’re deciding between TRL and Unsloth: Choose TRL if you:

Want standard Hugging Face ecosystem integration
Need maximum flexibility and customization
Prefer widely-adopted, well-documented tools
Plan to use other TRL features (PPO, DPO, etc.)

Choose Unsloth if you:

Want 2x faster training speed
Need maximum memory efficiency
Prefer automatic optimizations
Want simplified export to multiple formats

See the SFT with Unsloth tutorial for comparison.

Next steps

After completing this tutorial, you can:

Try SFT with Unsloth for faster training
Explore GRPO fine-tuning for reinforcement learning
Learn about continued pre-training for language adaptation
Deploy your model using the inference guides

Getting help

Need assistance? Join the Liquid AI Discord Community:

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Supervised fine-tuning with TRL

What you’ll learn

Prerequisites

Why use TRL?

Tutorial overview

Key concepts

TRL’s SFTTrainer

Standard vs parameter-efficient fine-tuning

Deployment options

Run the tutorial

Access the notebook

Comparison with Unsloth

Next steps

Getting help

Build docs developers (and LLMs) love

Overview

Local AI Apps

Mobile Deployment

Fine-Tuning

Community

Documentation Index

​What you’ll learn

​Prerequisites

​Why use TRL?

​Tutorial overview

​Key concepts

​TRL’s SFTTrainer

​Standard vs parameter-efficient fine-tuning

​Deployment options

​Run the tutorial

​Access the notebook

​Comparison with Unsloth

​Next steps

​Getting help

Build docs developers (and LLMs) love

What you’ll learn

Prerequisites

Why use TRL?

Tutorial overview

Key concepts

TRL’s SFTTrainer

Standard vs parameter-efficient fine-tuning

Deployment options

Run the tutorial

Access the notebook

Comparison with Unsloth

Next steps

Getting help