LoRe is a lightweight, modular research codebase for training personalized reward models in multi-user environments. Instead of a single monolithic reward function, LoRe learns a shared low-rank basis of reward directions and per-user mixture weights β enabling efficient personalization and strong few-shot generalization to new users.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Run your first personalized reward model in minutes using the RedditTLDR dataset.
Installation
Set up Python dependencies and prepare your environment for LoRe experiments.
Core Concepts
Understand the low-rank reward decomposition and alternating minimization algorithm.
API Reference
Explore the full API: model classes, training functions, and evaluation utilities.
Supported datasets
LoRe includes ready-to-run experiment scripts for three benchmark preference datasets:Reddit TLDR
Headline preference modeling across crowd workers.
PRISM
Multi-turn dialogue response preferences with diverse user profiles.
PersonalLLM
Open-ended LLM response personalization with simulated user populations.
How it works
Prepare embeddings
Run
prepare.py for your chosen dataset to extract reward model embeddings from preference pairs.Train the shared basis
Run
train_basis.py to jointly learn the low-rank reward basis V and per-user weights W via alternating minimization.Key capabilities
- Low-rank decomposition: Shared basis
Vcaptures diverse reward directions; users are represented by mixture weightsW - Few-shot adaptation: New users can be personalized from as few as 1β5 preference examples
- Regularization: Optional cosine similarity regularization keeps the learned basis aligned with the pretrained reward model
- Modular design: Each dataset lives in its own directory with independent prepare/train/evaluate scripts
LoRe is a research codebase accompanying the paper LoRe: Personalizing LLMs via Low-Rank Reward Modeling. It is designed for reproducing and extending the experiments described in that work.