LoRe’s approach to personalization is to give every user a compact fingerprint — a weight vectorDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt
Use this file to discover all available pages before exploring further.
w_i over K shared reward directions — rather than a full separate model. Because w_i lives on the probability simplex and has only K parameters, it is cheap to learn, easy to interpret, and transferable to users the model has never seen before.
User representation
Each useri is represented by a vector w_i ∈ ℝ^K. After a softmax transformation, w_i lies on the K-dimensional probability simplex: all entries are positive and sum to 1. The reward for user i on a response with feature vector x is:
V (shape [features × K]) is shared across all users. Only w_i is user-specific, making the per-user parameter count equal to K regardless of the embedding dimension.
Joint learning of W and V
W (the matrix of all users’ weight vectors, shape [N × K]) and V are learned simultaneously from the complete set of users’ preference data during training. Both are nn.Parameter objects inside LoRe_regularized:
W and V in separate steps each iteration, so each parameter set is optimized while the other is held fixed. See Low-rank reward modeling explained for the full training loop.
User splits: seen vs. unseen
LoRe experiments divide users into two groups:| Split | Description |
|---|---|
Seen users (train_workers) | Appear in both train and test datasets. W is learned jointly for these users. |
Unseen users (test_workers) | Appear only at test time. Their w_i must be inferred from a small held-out sample. |
Seen user, unseen prompts
The learned
w_i for a seen user is applied to prompts that were not part of training. This evaluates how well the reward generalizes to new content for a known user.run_regularized, this distinction maps directly to two evaluation blocks:
The PersonalizeBatch class
PersonalizeBatch handles adaptation for unseen users. It takes the pretrained V as a fixed input and only optimizes per-user weight vectors w:
V is never updated inside PersonalizeBatch — gradients only flow through self.w. This is what makes few-shot adaptation cheap: optimizing K parameters per user rather than the full [features × K] basis.
Generating synthetic user populations
For the PersonalLLM dataset, LoRe can generate synthetic user populations using a Dirichlet distribution. The Dirichlet naturally produces vectors on the simplex, matching the structure of softmax-normalizedw_i:
alphais the Dirichlet concentration parameter (aK-dimensional vector).Nis the number of synthetic users to generate.- Small
alphavalues (e.g.[0.1, 0.1, ...]) produce users who strongly prefer one basis direction. - Large
alphavalues (e.g.[10, 10, ...]) produce users with nearly uniform mixtures.
Conceptual layout of LoRe’s personalization structure:The rank
K determines how many independent reward axes exist in the population. Increasing K allows finer-grained personalization but requires more data per user to learn w_i reliably.