Skip to main content
Open In Colab This tutorial shows you how to perform supervised fine-tuning (SFT) using Unsloth for efficient, memory-optimized training. We’ll fine-tune the LFM2.5-1.2B-Instruct model on the FineTome-100k dataset using LoRA adapters and Unsloth’s optimizations.

What you’ll learn

By the end of this tutorial, you’ll know how to:
  • Set up your environment for fine-tuning with Unsloth
  • Prepare datasets for supervised fine-tuning
  • Configure LoRA adapters for parameter-efficient training
  • Train a model with 2x faster speed and reduced memory usage
  • Perform local inference with your fine-tuned model
  • Export models for various deployment targets

Prerequisites

  • GPU: This tutorial requires a GPU. You can run it for free on Google Colab using an NVIDIA T4 GPU
  • Python: Python 3.8 or higher
  • Basic knowledge: Familiarity with PyTorch and Hugging Face Transformers

Why use Unsloth?

Unsloth provides several advantages for fine-tuning:
  • 2x faster training compared to standard training loops
  • Reduced memory consumption enabling larger batch sizes
  • Built-in LoRA support for parameter-efficient fine-tuning
  • Automatic optimizations including fused kernels and memory management
  • Easy export to multiple deployment formats

Tutorial overview

The tutorial covers the following steps:
  1. Installation: Set up Unsloth and required dependencies
  2. Model loading: Load the LFM2.5-1.2B-Instruct model with Unsloth optimizations
  3. Data preparation: Format the FineTome-100k dataset for instruction fine-tuning
  4. LoRA configuration: Configure LoRA adapters for efficient training
  5. Training: Run supervised fine-tuning with optimized training settings
  6. Inference: Test your fine-tuned model locally
  7. Export: Save and export your model for deployment

Key concepts

Supervised fine-tuning

Supervised fine-tuning trains a model on labeled instruction-response pairs. This helps the model learn to follow specific instructions and generate appropriate responses for your use case.

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning technique that:
  • Only trains a small number of additional parameters
  • Reduces memory requirements significantly
  • Maintains model quality while being more efficient
  • Can be easily merged back into the base model

Unsloth optimizations

Unsloth applies several optimizations:
  • Fused attention kernels for faster forward passes
  • Gradient checkpointing for reduced memory usage
  • Mixed precision training for speed improvements
  • Optimized LoRA implementation

Deployment options

After fine-tuning, you can deploy your model to:
  • Mobile: Android and iOS apps using the LEAP SDK
  • Desktop: Mac (MLX), Windows/Linux (llama.cpp, Ollama, LM Studio)
  • Cloud: vLLM, Modal, Baseten, Fal for production deployments
  • Edge: On-device inference for low-latency applications
See the deployment documentation for detailed guides.

Run the tutorial

You can run this tutorial in two ways:
  1. Google Colab (recommended for beginners): Click the “Open in Colab” badge at the top
  2. Local environment: Clone the LFM Cookbook repository and run the notebook locally

Access the notebook

The complete notebook is available at:

Next steps

After completing this tutorial, you can:

Getting help

Need assistance? Join the Liquid AI Discord Community: Join Discord

Build docs developers (and LLMs) love