Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt

Use this file to discover all available pages before exploring further.

LoRe is a lightweight, modular research codebase for training personalized reward models in multi-user environments. Instead of a single monolithic reward function, LoRe learns a shared low-rank basis of reward directions and per-user mixture weights β€” enabling efficient personalization and strong few-shot generalization to new users.

Quickstart

Run your first personalized reward model in minutes using the RedditTLDR dataset.

Installation

Set up Python dependencies and prepare your environment for LoRe experiments.

Core Concepts

Understand the low-rank reward decomposition and alternating minimization algorithm.

API Reference

Explore the full API: model classes, training functions, and evaluation utilities.

Supported datasets

LoRe includes ready-to-run experiment scripts for three benchmark preference datasets:

Reddit TLDR

Headline preference modeling across crowd workers.

PRISM

Multi-turn dialogue response preferences with diverse user profiles.

PersonalLLM

Open-ended LLM response personalization with simulated user populations.

How it works

1

Prepare embeddings

Run prepare.py for your chosen dataset to extract reward model embeddings from preference pairs.
2

Train the shared basis

Run train_basis.py to jointly learn the low-rank reward basis V and per-user weights W via alternating minimization.
3

Evaluate few-shot personalization

Run vary_fewshot.py to measure how quickly LoRe adapts to unseen users with limited preference data.

Key capabilities

  • Low-rank decomposition: Shared basis V captures diverse reward directions; users are represented by mixture weights W
  • Few-shot adaptation: New users can be personalized from as few as 1–5 preference examples
  • Regularization: Optional cosine similarity regularization keeps the learned basis aligned with the pretrained reward model
  • Modular design: Each dataset lives in its own directory with independent prepare/train/evaluate scripts
LoRe is a research codebase accompanying the paper LoRe: Personalizing LLMs via Low-Rank Reward Modeling. It is designed for reproducing and extending the experiments described in that work.

Build docs developers (and LLMs) love