LoRe’s threeDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt
Use this file to discover all available pages before exploring further.
nn.Module classes represent the core of the reward basis learning pipeline. LoRe and LoRe_regularized jointly learn a shared basis matrix V (shape [num_features, K]) and per-user weight matrix W (shape [N, K]) from preference data. PersonalizeBatch holds V fixed and adapts only user weights, enabling fast personalization for new users at inference time.
LoRe
LoRe performs joint reward basis learning with optional L2 regularization toward a reference SFT vector V_sft. It uses a single Adam optimizer over all parameters and is the simpler of the two basis-learning modules.
Constructor parameters
Reference reward direction from the SFT model, shape
[num_features] or [num_features, 1]. Used as the L2 regularization target for each basis vector. Pass the final linear layer weights of your reward model.L2 regularization strength. Controls how strongly each column of
V is pulled toward V_sft. Set to 0 to disable regularization entirely.Number of users (training population size
N). Determines the number of rows in the learned weight matrix W.Dimensionality of the embedding space
F. Must match the feature dimension of your training tensors (e.g., 4096 for Llama-3.1-8B embeddings).Number of basis vectors
K. The rank of the factorization. Larger values increase expressivity at the cost of overfitting and compute.Number of Adam gradient steps. The model does not use early stopping; training always runs for exactly this many steps.
Learning rate for the Adam optimizer shared by
W and V.Learned parameters
| Parameter | Shape | Description |
|---|---|---|
W | [num_classes, num_basis_vectors] | Raw logits for per-user basis weights. softmax(W, dim=1) gives the probability simplex over basis vectors. |
V | [num_features, num_basis_vectors] | Reward basis matrix. Each column is a direction in embedding space. |
Methods
X is a list of N tensors, each shape [m_i, F], where m_i is the number of preference pairs for user i. Returns (nll, reg).
num_iterations steps of Adam, minimizing nll + reg. Returns (softmax(W, dim=1), V) — the softmax-normalized weight matrix and raw basis matrix.
train() calls self.to(device) internally, so you do not need to move the model to GPU before calling it. The returned V is not detached; call .detach() before passing it downstream if you do not want gradients to flow.LoRe_regularized
LoRe_regularized is the production variant used for PRISM experiments. Key differences from LoRe: it uses cosine similarity regularization instead of L2, maintains separate Adam optimizers for W and V with alternating updates, applies a warmup schedule for alpha, and prunes unused basis vectors after training.
Constructor parameters
Reference reward direction from the SFT model. Normalized once at construction time (
F.normalize(V_sft, dim=0)) and stored as self.V_sft_norm. Used for cosine similarity regularization.Maximum regularization coefficient reached after the warmup period. The actual coefficient applied at each step is computed by
_alpha_at_step(step).Number of users (training population size
N).Embedding dimensionality
F. Hard-coded to 4096 inside solve_regularized_simplex; ensure your embeddings match.Initial number of basis vectors before pruning. After training, vectors whose maximum softmax weight across all users is below
1e-2 are discarded.Total training steps.
run_regularized passes 20000; run passes 1000.Learning rate used by both
optimizer_W and optimizer_V.Methods
X_cat of shape [N_total, F] and a label vector y of shape [N_total] (values 0..C-1). Called once before the training loop to avoid repeated concatenation.
(nll, reg, entropy_loss) from pre-packed data. Cosine regularization is only applied when alpha_curr > 0:
| Step range | Returned alpha |
|---|---|
< 0.2 * num_iterations | 0.0 |
0.2 * num_iterations to 0.8 * num_iterations | Linear ramp from 0.0 to alpha |
>= 0.8 * num_iterations | alpha |
- Freeze
V, updateWwith NLL only (alpha_curr=0). - Freeze
W, updateVwithNLL + alpha_curr * cosine_reg.
max_c softmax(W)[c, i] < 1e-2. Returns (W_kept, V_kept) with shapes [N, K_kept] and [F, K_kept].
PersonalizeBatch
PersonalizeBatch adapts new users to a fixed reward basis V. Only per-user weight vectors w[i] are learned; V is passed as an argument and receives no gradient. This makes it suitable for few-shot personalization after basis training is complete.
Constructor parameters
Number of users to personalize simultaneously. Sets the length of the
ParameterList.Embedding dimensionality
F. Only used internally to confirm tensor shapes; not stored as an attribute used during forward.Dimensionality
K of each user weight vector. Must match the number of columns in V.Number of Adam gradient steps for user adaptation.
Learning rate for the shared Adam optimizer over all
w[i].Learned parameters
| Parameter | Shape | Description |
|---|---|---|
w | ParameterList of num_classes vectors, each [num_basis_vectors] | Raw logits for each user’s mixture over basis vectors. softmax(w[i]) is the per-user weight. |
Methods
i:
V is not modified; pass V.detach() to prevent accidental gradient flow into the basis.
num_iterations Adam steps. Returns a list of len(X) detached softmax weight vectors, one per user, each shape [num_basis_vectors].
PersonalizeBatch is most commonly accessed through the learn_multiple_few_shot() wrapper, which handles instantiation and device placement automatically.