LoRe is a lightweight, modular framework for learning personalized reward models from multi-user preference data. Rather than training a single monolithic reward model that averages over all users, LoRe decomposes the reward signal into a shared low-rank basis and individual user weights — enabling the model to capture genuine diversity in what different people find valuable in language model outputs.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt
Use this file to discover all available pages before exploring further.
The problem with monolithic reward models
Standard reward model training aggregates preferences from many annotators into a single scalar signal. This works reasonably well when users agree, but human preferences are not uniform. A user who values conciseness may consistently prefer shorter summaries; another who values detail may prefer longer ones. A monolithic Bradley-Terry model trained on pooled data will find a compromise that satisfies neither. LoRe addresses this directly: instead of one reward function, it learns a family of reward functions parameterized by a shared low-rank basis V (a linear transformation on fixed reward model embeddings) and per-user mixture weights W. Each user’s reward is a softmax-weighted combination of basis vectors, so the model can represent a diverse population of preferences while keeping the number of parameters manageable.Key insight: low-rank reward decomposition
The core mathematical idea is to factorize the reward as:- x is the embedding difference between a chosen and a rejected response (extracted from a pretrained reward model backbone, e.g.
Skywork/Skywork-Reward-Llama-3.1-8B-v0.2) - V ∈ ℝ^(d × K) is the shared low-rank basis matrix, with K basis vectors learned jointly across all users
- w_i ∈ Δ^K is the per-user weight vector (on the probability simplex via softmax)
run() function in utils.py sweeps over a configurable K_list to find the right rank for your dataset.
Once the basis V is learned from seen users, new users can be personalized with only a handful of preference examples by fitting their weight vector w while keeping V fixed — this is the few-shot personalization setting.
Supported datasets
LoRe ships with experiment scripts for three benchmark datasets:| Dataset | Domain | Script directory |
|---|---|---|
| Reddit TLDR | Summarization preference modeling | RedditTLDR/ |
| PRISM | Multi-turn dialogue response preferences | PRISM/ |
| PersonalLLM | Open-ended LLM response personalization | PersonalLLM/ |
prepare.py → train_basis.py → vary_fewshot.py.
Features
- Low-rank joint learning: learns basis V and per-user weights W simultaneously across the full user population
- Few-shot personalization: adapts to new users using only a small number of preference examples by fitting their weight vector against the frozen basis
- Regularization support:
LoRe_regularizedadds cosine similarity regularization to the SFT reference model, controlled by a warm-up schedule over training iterations - Modular design: dataset preprocessing, training, and evaluation are separated into composable scripts;
utils.pyexposes all core primitives - Evaluation on seen and unseen users: the
run()pipeline automatically evaluates train accuracy, seen-user generalization to unseen prompts, and few-shot accuracy for unseen users
Paper and citation
LoRe is described in detail in the paper:LoRe: Personalizing LLMs via Low-Rank Reward Modeling Avinandan Bose, Zhihan Xiong, Yuejie Chi, Simon Shaolei Du, Lin Xiao, Maryam Fazel arXiv:2504.14439 · 2025
Where to go next
Quickstart
Run your first LoRe experiment on the Reddit TLDR dataset in three steps.
Installation
Set up Python, PyTorch, and all required dependencies.
Datasets
Learn how Reddit TLDR, PRISM, and PersonalLLM are preprocessed and structured.
API reference
Full documentation for
utils.py classes and functions.License
LoRe is released under the CC-BY-NC 4.0 license. Non-commercial use only. See theLICENSE file in the repository root for the full text.