Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can increase multi-GPU training throughput by 20% and reduce memory usage by 60%, enabling up to a 4x increase in context length. Liger Kernel provides Hugging Face-compatible replacements forDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/huggingface/trl/llms.txt
Use this file to discover all available pages before exploring further.
RMSNorm, RoPE, SwiGLU, CrossEntropy, and FusedLinearCrossEntropy. It works out of the box with FlashAttention, PyTorch FSDP, and Microsoft DeepSpeed.
With the memory reduction from Liger Kernel, you can potentially disable cpu_offloading or gradient checkpointing to further boost performance.
Installation
Supported trainers
Liger Kernel is supported in the following TRL trainers:SFT
Supervised Fine-Tuning
DPO
Direct Preference Optimization
GRPO
Group Relative Policy Optimization
KTO
Kahneman-Tversky Optimization
GKD
Generalized Knowledge Distillation
Usage
Setuse_liger_kernel=True in your trainer config. No other changes are needed.
- SFT
- DPO
- GRPO
- KTO
- GKD
Performance benefits
| Metric | Improvement |
|---|---|
| Training throughput | +20% on multi-GPU setups |
| GPU memory usage | −60% |
| Achievable context length | Up to 4x longer |
FusedLinearCrossEntropy fuses the final linear projection with the cross-entropy loss, which removes the need to store the full vocabulary-sized logit tensor.
Additional resources
Liger Kernel repository
Source code, benchmarks, and detailed documentation.