What you’ll learn
By the end of this tutorial, you’ll know how to:- Set up your environment with TRL and Hugging Face libraries
- Load and prepare datasets for instruction fine-tuning
- Use TRL’s SFTTrainer for supervised fine-tuning
- Configure LoRA for parameter-efficient fine-tuning on constrained hardware
- Test your model with local inference
- Save and export models for deployment
Prerequisites
- GPU: This tutorial requires a GPU. You can run it for free on Google Colab using an NVIDIA T4 GPU
- Python: Python 3.8 or higher
- Basic knowledge: Familiarity with Hugging Face Transformers library
Why use TRL?
TRL (Transformer Reinforcement Learning) is a full-stack library from Hugging Face that provides:- SFTTrainer: Simplified supervised fine-tuning with best practices built-in
- Deep integration with Hugging Face ecosystem (datasets, models, hub)
- LoRA support through PEFT integration for memory-efficient training
- Flexible configuration for various training scenarios
- Active development and community support
Tutorial overview
The tutorial covers the following steps:- Installation: Install TRL, Transformers, and PEFT libraries
- Model loading: Load the LFM2.5-1.2B-Instruct model from Hugging Face Hub
- Dataset preparation: Format the SmolTalk dataset for instruction tuning
- Training configuration: Set up SFTTrainer with LoRA configuration
- Training: Fine-tune the model on your dataset
- Inference: Test the fine-tuned model
- Model export: Save your model for deployment
Key concepts
TRL’s SFTTrainer
SFTTrainer simplifies supervised fine-tuning by:- Automatically handling chat templates and formatting
- Providing sensible default hyperparameters
- Integrating with PEFT for LoRA training
- Supporting mixed precision training out of the box
- Including logging and checkpointing utilities
Standard vs parameter-efficient fine-tuning
The tutorial demonstrates both approaches: Standard fine-tuning:- Updates all model parameters
- Requires more GPU memory
- Can achieve slightly better performance
- Best for larger GPUs (24GB+)
- Only trains additional low-rank matrices
- Reduces memory requirements by 3-4x
- Enables fine-tuning on consumer GPUs
- Maintains most of the performance
- Best for GPUs with 8-16GB memory
Deployment options
After fine-tuning, you can deploy your model to:- Mobile: Android and iOS apps using the LEAP SDK
- Desktop: Mac (MLX), Windows/Linux (llama.cpp, Ollama, LM Studio)
- Cloud: vLLM, Modal, Baseten, Fal for production deployments
- Edge: On-device inference for low-latency applications
Run the tutorial
You can run this tutorial in two ways:- Google Colab (recommended for beginners): Click the “Open in Colab” badge at the top
- Local environment: Clone the LFM Cookbook repository and run the notebook locally
Access the notebook
The complete notebook is available at:- GitHub: sft_with_trl.ipynb
- Colab: Click the badge above to open directly in Google Colab
Comparison with Unsloth
If you’re deciding between TRL and Unsloth: Choose TRL if you:- Want standard Hugging Face ecosystem integration
- Need maximum flexibility and customization
- Prefer widely-adopted, well-documented tools
- Plan to use other TRL features (PPO, DPO, etc.)
- Want 2x faster training speed
- Need maximum memory efficiency
- Prefer automatic optimizations
- Want simplified export to multiple formats
Next steps
After completing this tutorial, you can:- Try SFT with Unsloth for faster training
- Explore GRPO fine-tuning for reinforcement learning
- Learn about continued pre-training for language adaptation
- Deploy your model using the inference guides