What you’ll learn
By the end of this tutorial, you’ll know how to:- Set up your environment for fine-tuning with Unsloth
- Prepare datasets for supervised fine-tuning
- Configure LoRA adapters for parameter-efficient training
- Train a model with 2x faster speed and reduced memory usage
- Perform local inference with your fine-tuned model
- Export models for various deployment targets
Prerequisites
- GPU: This tutorial requires a GPU. You can run it for free on Google Colab using an NVIDIA T4 GPU
- Python: Python 3.8 or higher
- Basic knowledge: Familiarity with PyTorch and Hugging Face Transformers
Why use Unsloth?
Unsloth provides several advantages for fine-tuning:- 2x faster training compared to standard training loops
- Reduced memory consumption enabling larger batch sizes
- Built-in LoRA support for parameter-efficient fine-tuning
- Automatic optimizations including fused kernels and memory management
- Easy export to multiple deployment formats
Tutorial overview
The tutorial covers the following steps:- Installation: Set up Unsloth and required dependencies
- Model loading: Load the LFM2.5-1.2B-Instruct model with Unsloth optimizations
- Data preparation: Format the FineTome-100k dataset for instruction fine-tuning
- LoRA configuration: Configure LoRA adapters for efficient training
- Training: Run supervised fine-tuning with optimized training settings
- Inference: Test your fine-tuned model locally
- Export: Save and export your model for deployment
Key concepts
Supervised fine-tuning
Supervised fine-tuning trains a model on labeled instruction-response pairs. This helps the model learn to follow specific instructions and generate appropriate responses for your use case.LoRA (Low-Rank Adaptation)
LoRA is a parameter-efficient fine-tuning technique that:- Only trains a small number of additional parameters
- Reduces memory requirements significantly
- Maintains model quality while being more efficient
- Can be easily merged back into the base model
Unsloth optimizations
Unsloth applies several optimizations:- Fused attention kernels for faster forward passes
- Gradient checkpointing for reduced memory usage
- Mixed precision training for speed improvements
- Optimized LoRA implementation
Deployment options
After fine-tuning, you can deploy your model to:- Mobile: Android and iOS apps using the LEAP SDK
- Desktop: Mac (MLX), Windows/Linux (llama.cpp, Ollama, LM Studio)
- Cloud: vLLM, Modal, Baseten, Fal for production deployments
- Edge: On-device inference for low-latency applications
Run the tutorial
You can run this tutorial in two ways:- Google Colab (recommended for beginners): Click the “Open in Colab” badge at the top
- Local environment: Clone the LFM Cookbook repository and run the notebook locally
Access the notebook
The complete notebook is available at:- GitHub: sft_with_unsloth.ipynb
- Colab: Click the badge above to open directly in Google Colab
Next steps
After completing this tutorial, you can:- Try SFT with TRL for an alternative training approach
- Explore GRPO fine-tuning for reinforcement learning
- Learn about continued pre-training for language adaptation
- Deploy your model using the inference guides