LoRe: Low-Rank Personalized Reward Modeling

LoRe is a lightweight, modular framework for learning personalized reward models from multi-user preference data. Rather than training a single monolithic reward model that averages over all users, LoRe decomposes the reward signal into a shared low-rank basis and individual user weights — enabling the model to capture genuine diversity in what different people find valuable in language model outputs.

The problem with monolithic reward models

Standard reward model training aggregates preferences from many annotators into a single scalar signal. This works reasonably well when users agree, but human preferences are not uniform. A user who values conciseness may consistently prefer shorter summaries; another who values detail may prefer longer ones. A monolithic Bradley-Terry model trained on pooled data will find a compromise that satisfies neither. LoRe addresses this directly: instead of one reward function, it learns a family of reward functions parameterized by a shared low-rank basis V (a linear transformation on fixed reward model embeddings) and per-user mixture weights W. Each user’s reward is a softmax-weighted combination of basis vectors, so the model can represent a diverse population of preferences while keeping the number of parameters manageable.

Key insight: low-rank reward decomposition

The core mathematical idea is to factorize the reward as:

r(x, user_i) = x · V · w_i

where:

x is the embedding difference between a chosen and a rejected response (extracted from a pretrained reward model backbone, e.g. Skywork/Skywork-Reward-Llama-3.1-8B-v0.2)
V ∈ ℝ^(d × K) is the shared low-rank basis matrix, with K basis vectors learned jointly across all users
w_i ∈ Δ^K is the per-user weight vector (on the probability simplex via softmax)

When K = 0 the model reduces to the pretrained reference reward. When K = 1 it is equivalent to a standard Bradley-Terry model. As K increases, the model gains capacity to represent more diverse preference patterns. The run() function in utils.py sweeps over a configurable K_list to find the right rank for your dataset. Once the basis V is learned from seen users, new users can be personalized with only a handful of preference examples by fitting their weight vector w while keeping V fixed — this is the few-shot personalization setting.

Supported datasets

LoRe ships with experiment scripts for three benchmark datasets:

Dataset	Domain	Script directory
Reddit TLDR	Summarization preference modeling	`RedditTLDR/`
PRISM	Multi-turn dialogue response preferences	`PRISM/`
PersonalLLM	Open-ended LLM response personalization	`PersonalLLM/`

Each dataset directory follows the same three-script workflow: prepare.py → train_basis.py → vary_fewshot.py.

Features

Low-rank joint learning: learns basis V and per-user weights W simultaneously across the full user population
Few-shot personalization: adapts to new users using only a small number of preference examples by fitting their weight vector against the frozen basis
Regularization support: LoRe_regularized adds cosine similarity regularization to the SFT reference model, controlled by a warm-up schedule over training iterations
Modular design: dataset preprocessing, training, and evaluation are separated into composable scripts; utils.py exposes all core primitives
Evaluation on seen and unseen users: the run() pipeline automatically evaluates train accuracy, seen-user generalization to unseen prompts, and few-shot accuracy for unseen users

Paper and citation

LoRe is described in detail in the paper:

LoRe: Personalizing LLMs via Low-Rank Reward Modeling Avinandan Bose, Zhihan Xiong, Yuejie Chi, Simon Shaolei Du, Lin Xiao, Maryam Fazel arXiv:2504.14439 · 2025

@misc{bose2025lorepersonalizingllmslowrank,
      title={LoRe: Personalizing LLMs via Low-Rank Reward Modeling},
      author={Avinandan Bose and Zhihan Xiong and Yuejie Chi and Simon Shaolei Du and Lin Xiao and Maryam Fazel},
      year={2025},
      eprint={2504.14439},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.14439},
}

Where to go next

Quickstart

Run your first LoRe experiment on the Reddit TLDR dataset in three steps.

Installation

Set up Python, PyTorch, and all required dependencies.

Datasets

Learn how Reddit TLDR, PRISM, and PersonalLLM are preprocessed and structured.

API reference

Full documentation for utils.py classes and functions.

License

LoRe is released under the CC-BY-NC 4.0 license. Non-commercial use only. See the LICENSE file in the repository root for the full text.

Get Started

Concepts

Datasets

Training & Evaluation

LoRe: Low-Rank Personalized Reward Modeling

The problem with monolithic reward models

Key insight: low-rank reward decomposition

Supported datasets

Features

Paper and citation

Where to go next

Quickstart

Installation

Datasets

API reference

License

Build docs developers (and LLMs) love

Get Started

Concepts

Datasets

Training & Evaluation

Documentation Index

​The problem with monolithic reward models

​Key insight: low-rank reward decomposition

​Supported datasets

​Features

​Paper and citation

​Where to go next

Quickstart

Installation

Datasets

API reference

​License

Build docs developers (and LLMs) love

The problem with monolithic reward models

Key insight: low-rank reward decomposition

Supported datasets

Features

Paper and citation

Where to go next

License