Get started with LoRe

This guide walks you through cloning the LoRe repository, installing dependencies, extracting reward model embeddings from the Reddit TLDR dataset, and training your first low-rank personalized reward model. The full pipeline takes roughly 30–60 minutes depending on your hardware; embedding extraction requires a CUDA GPU.

Clone the repository

Clone the LoRe codebase from GitHub:

git clone https://github.com/facebookresearch/LoRe.git
cd LoRe

The repository contains one directory per supported dataset (RedditTLDR/, PRISM/, PersonalLLM/) plus the shared utils.py module at the root. All dataset scripts import from utils.py via a sys.path insertion, so you do not need to install LoRe as a package.

Install dependencies

LoRe requires Python 3.8+ and a CUDA-capable GPU (recommended). Install all Python dependencies with:

pip install -r requirements.txt

This installs the following pinned packages:

requirements.txt

datasets==2.21.0
matplotlib==3.10.1
numpy==2.2.4
pandas==2.2.3
pydantic==2.11.1
Requests==2.32.3
safetensors==0.5.3
scikit_learn==1.6.1
scipy==1.15.2
torch==2.3.0
torchinfo==1.8.0
transformers==4.46.3

The embedding extraction step (prepare.py) loads Skywork/Skywork-Reward-Llama-3.1-8B-v0.2, an ~16 GB model that requires Flash Attention 2. Install it separately if not already present:

pip install flash-attn --no-build-isolation

Prepare the dataset

Move into the RedditTLDR directory and run prepare.py:

cd RedditTLDR
python prepare.py

This script performs three things:

Downloads the dataset — loads openai/summarize_from_feedback (comparisons split) from the Hugging Face Hub using the datasets library.
Organizes by worker — groups all preference pairs by annotator ID (worker), extracting the post text, the chosen summary, and the rejected summary for each pair.
Extracts embeddings — loads Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 with attn_implementation="flash_attention_2" and extracts the final hidden-state vector of the last token for each chosen and rejected response.

The relevant section of prepare.py:

prepare.py

from transformers import AutoModel, AutoTokenizer
import torch

device = "cuda:0"
model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"
rm = AutoModel.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map=device,
    attn_implementation="flash_attention_2",
    num_labels=1,
)
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)

# For each worker's preference pair:
with torch.no_grad():
    embedding_winning = rm(inputs_winning).last_hidden_state[0][-1].cpu()
    embedding_losing  = rm(inputs_losing).last_hidden_state[0][-1].cpu()

When complete, two pickle files are written to the RedditTLDR/ directory:

tldr_embeddings_train.pkl — embeddings for the training split
tldr_embeddings_val.pkl — embeddings for the validation split

Each file is a dictionary keyed by worker ID. You only need to run prepare.py once; subsequent training runs load directly from these files.

prepare.py requires a CUDA GPU with at least ~18 GB VRAM to load the Skywork model in bfloat16. It will fail on CPU-only machines.

Train the basis model

With embeddings in place, train the shared reward basis and per-user weights:

python train_basis.py

The script takes the following steps:

Loads the pretrained reward head — extracts the final linear layer weights from Skywork-Reward-Llama-3.1-8B-v0.2 as the reference vector V_final:

train_basis.py

for name, module in rm.named_modules():
    if isinstance(module, torch.nn.Linear):
        last_linear_layer = module
V_final = last_linear_layer.weight[:,0].to(device).to(torch.float32).reshape(-1, 1)

Splits workers into seen and unseen — workers present in both train and validation sets are shuffled with random.seed(0) and split 50/50. Seen workers are used for joint basis learning; unseen workers are held out for few-shot evaluation.

Runs the full LoRe pipeline via run() from utils.py, sweeping over basis ranks:

train_basis.py

K_list = [0, 1, 2, 3, 4, 5, 6]
alpha_list = [0]

(train_accuracies_joint,
 seen_user_unseen_prompts_accuracies_joint,
 few_shot_train_accuracies_few_shot,
 unseen_user_unseen_prompts_accuracies_few_shot,
 ...) = run(
    K_list, alpha_list, V_final,
    train_features, test_features_sparse,
    train_features_unseen, test_features_sparse_unseen,
    N, N_unseen, device
)

The K_list controls which ranks are evaluated:

K	Model
0	Reference model (pretrained reward head, no adaptation)
1	Standard Bradley-Terry model (single basis vector)
2–6	Low-rank LoRe with K basis vectors

Training uses Adam with learning_rate=0.5 for 1,000 iterations per rank. Few-shot weight fitting for unseen users runs for 500 iterations with learning_rate=0.1.

Interpret results

After training completes, run() returns four accuracy arrays — one per evaluation setting — across the K values in K_list. The console output from eval_multiple() shows per-setting results at each rank. For each K you will see output of the form:

K : 0
Train Performance
Average accuracy: 0.XXXX
Standard deviation of accuracy: 0.XXXX
Seen User Unseen Prompts
Average accuracy: 0.XXXX
...
K : 4
Train Performance
Average accuracy: 0.XXXX
...
Few Shot Train Performance
Average accuracy: 0.XXXX
...
Unseen User Unseen Prompts
Average accuracy: 0.XXXX

Actual values depend on the worker split (seeded with random.seed(0)) and hardware.The four metrics to watch are:

Train accuracy — pairwise preference accuracy on seen users’ training prompts. Increases with K as the model gains capacity.
Seen user, unseen prompts — generalization of learned user weights to held-out prompts for the same users. Tests whether personalization transfers beyond training data.
Few-shot train accuracy — accuracy on the few-shot examples used to fit weights for unseen users. Should be high if the basis is expressive enough.
Unseen user, unseen prompts — the hardest setting: new users with new prompts, adapted from the shared basis using only a handful of examples. This is LoRe’s primary evaluation target.

A meaningful improvement from K=1 to K=3–5 on the “unseen user, unseen prompts” metric indicates that the learned basis captures genuine population-level preference diversity that transfers to new users. Diminishing returns at high K suggest the low-rank structure is a good fit for the dataset.

Next steps

To run the same workflow on PRISM or PersonalLLM, navigate to the corresponding directory and follow the same prepare.py → train_basis.py sequence. The PRISM pipeline adds a separate embedding generation step (generate-prism-embeddings.py) and supports evaluation on RewardBench 2 via eval_rb2.py.

Get Started

Concepts

Datasets

Training & Evaluation

Get started with LoRe

Next steps

Build docs developers (and LLMs) love

Get Started

Concepts

Datasets

Training & Evaluation

Documentation Index

​Next steps

Build docs developers (and LLMs) love

Next steps