Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through cloning the LoRe repository, installing dependencies, extracting reward model embeddings from the Reddit TLDR dataset, and training your first low-rank personalized reward model. The full pipeline takes roughly 30–60 minutes depending on your hardware; embedding extraction requires a CUDA GPU.
1

Clone the repository

Clone the LoRe codebase from GitHub:
git clone https://github.com/facebookresearch/LoRe.git
cd LoRe
The repository contains one directory per supported dataset (RedditTLDR/, PRISM/, PersonalLLM/) plus the shared utils.py module at the root. All dataset scripts import from utils.py via a sys.path insertion, so you do not need to install LoRe as a package.
2

Install dependencies

LoRe requires Python 3.8+ and a CUDA-capable GPU (recommended). Install all Python dependencies with:
pip install -r requirements.txt
This installs the following pinned packages:
requirements.txt
datasets==2.21.0
matplotlib==3.10.1
numpy==2.2.4
pandas==2.2.3
pydantic==2.11.1
Requests==2.32.3
safetensors==0.5.3
scikit_learn==1.6.1
scipy==1.15.2
torch==2.3.0
torchinfo==1.8.0
transformers==4.46.3
The embedding extraction step (prepare.py) loads Skywork/Skywork-Reward-Llama-3.1-8B-v0.2, an ~16 GB model that requires Flash Attention 2. Install it separately if not already present:
pip install flash-attn --no-build-isolation
3

Prepare the dataset

Move into the RedditTLDR directory and run prepare.py:
cd RedditTLDR
python prepare.py
This script performs three things:
  1. Downloads the dataset — loads openai/summarize_from_feedback (comparisons split) from the Hugging Face Hub using the datasets library.
  2. Organizes by worker — groups all preference pairs by annotator ID (worker), extracting the post text, the chosen summary, and the rejected summary for each pair.
  3. Extracts embeddings — loads Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 with attn_implementation="flash_attention_2" and extracts the final hidden-state vector of the last token for each chosen and rejected response.
The relevant section of prepare.py:
prepare.py
from transformers import AutoModel, AutoTokenizer
import torch

device = "cuda:0"
model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"
rm = AutoModel.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map=device,
    attn_implementation="flash_attention_2",
    num_labels=1,
)
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)

# For each worker's preference pair:
with torch.no_grad():
    embedding_winning = rm(inputs_winning).last_hidden_state[0][-1].cpu()
    embedding_losing  = rm(inputs_losing).last_hidden_state[0][-1].cpu()
When complete, two pickle files are written to the RedditTLDR/ directory:
  • tldr_embeddings_train.pkl — embeddings for the training split
  • tldr_embeddings_val.pkl — embeddings for the validation split
Each file is a dictionary keyed by worker ID. You only need to run prepare.py once; subsequent training runs load directly from these files.
prepare.py requires a CUDA GPU with at least ~18 GB VRAM to load the Skywork model in bfloat16. It will fail on CPU-only machines.
4

Train the basis model

With embeddings in place, train the shared reward basis and per-user weights:
python train_basis.py
The script takes the following steps:
  1. Loads the pretrained reward head — extracts the final linear layer weights from Skywork-Reward-Llama-3.1-8B-v0.2 as the reference vector V_final:
    train_basis.py
    for name, module in rm.named_modules():
        if isinstance(module, torch.nn.Linear):
            last_linear_layer = module
    V_final = last_linear_layer.weight[:,0].to(device).to(torch.float32).reshape(-1, 1)
    
  2. Splits workers into seen and unseen — workers present in both train and validation sets are shuffled with random.seed(0) and split 50/50. Seen workers are used for joint basis learning; unseen workers are held out for few-shot evaluation.
  3. Runs the full LoRe pipeline via run() from utils.py, sweeping over basis ranks:
    train_basis.py
    K_list = [0, 1, 2, 3, 4, 5, 6]
    alpha_list = [0]
    
    (train_accuracies_joint,
     seen_user_unseen_prompts_accuracies_joint,
     few_shot_train_accuracies_few_shot,
     unseen_user_unseen_prompts_accuracies_few_shot,
     ...) = run(
        K_list, alpha_list, V_final,
        train_features, test_features_sparse,
        train_features_unseen, test_features_sparse_unseen,
        N, N_unseen, device
    )
    
The K_list controls which ranks are evaluated:
KModel
0Reference model (pretrained reward head, no adaptation)
1Standard Bradley-Terry model (single basis vector)
2–6Low-rank LoRe with K basis vectors
Training uses Adam with learning_rate=0.5 for 1,000 iterations per rank. Few-shot weight fitting for unseen users runs for 500 iterations with learning_rate=0.1.
5

Interpret results

After training completes, run() returns four accuracy arrays — one per evaluation setting — across the K values in K_list. The console output from eval_multiple() shows per-setting results at each rank. For each K you will see output of the form:
K : 0
Train Performance
Average accuracy: 0.XXXX
Standard deviation of accuracy: 0.XXXX
Seen User Unseen Prompts
Average accuracy: 0.XXXX
...
K : 4
Train Performance
Average accuracy: 0.XXXX
...
Few Shot Train Performance
Average accuracy: 0.XXXX
...
Unseen User Unseen Prompts
Average accuracy: 0.XXXX
Actual values depend on the worker split (seeded with random.seed(0)) and hardware.The four metrics to watch are:
  • Train accuracy — pairwise preference accuracy on seen users’ training prompts. Increases with K as the model gains capacity.
  • Seen user, unseen prompts — generalization of learned user weights to held-out prompts for the same users. Tests whether personalization transfers beyond training data.
  • Few-shot train accuracy — accuracy on the few-shot examples used to fit weights for unseen users. Should be high if the basis is expressive enough.
  • Unseen user, unseen prompts — the hardest setting: new users with new prompts, adapted from the shared basis using only a handful of examples. This is LoRe’s primary evaluation target.
A meaningful improvement from K=1 to K=3–5 on the “unseen user, unseen prompts” metric indicates that the learned basis captures genuine population-level preference diversity that transfers to new users. Diminishing returns at high K suggest the low-rank structure is a good fit for the dataset.

Next steps

To run the same workflow on PRISM or PersonalLLM, navigate to the corresponding directory and follow the same prepare.pytrain_basis.py sequence. The PRISM pipeline adds a separate embedding generation step (generate-prism-embeddings.py) and supports evaluation on RewardBench 2 via eval_rb2.py.

Build docs developers (and LLMs) love