LoRe measures reward model quality by checking how often a personalized reward correctly ranks the chosen response above the rejected one.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt
Use this file to discover all available pages before exploring further.
evaluate_model() does this for a single user; eval_multiple() aggregates across a user population; sample_shots() creates sub-sampled few-shot datasets for controlled evaluation of sample-efficiency.
evaluate_model()
Computes the fraction of preference pairs for which a user’s personalized reward model correctly ranks chosen over rejected.X contains feature differences (chosen embedding minus rejected embedding), so the reward correctly ranks a pair when the dot product is positive.
Parameters
Feature difference matrix for one user, shape
[M, F]. Each row is embedding(chosen) - embedding(rejected) for a single preference pair. Internally converted to torch.float32 if not already a tensor.Reward basis matrix, shape
[F, K]. Shared across users after basis training.Per-user weight vector over basis directions, shape
[K]. Should be softmax-normalized (output of PersonalizeBatch.train() or LoRe.train()).Return value
Fraction of preference pairs ranked correctly, in
[0, 1]. Random chance yields 0.5; a perfect model yields 1.0.Example
evaluate_model() accepts X as either a NumPy array or a PyTorch tensor. It wraps the input in torch.tensor(..., dtype=torch.float32) unconditionally, which creates a copy even if X is already a float32 tensor. For performance in tight evaluation loops, pass tensors directly only if you accept the copy overhead.eval_multiple()
Evaluates accuracy for a list of users, each potentially with a differentV and w. Prints population mean and standard deviation, then returns all per-user accuracies.
Parameters
Per-user weight vectors, one tensor per user, each shape
[K]. Must be the same length as test_features.Reward basis matrices, one per user, each shape
[F, K]. In most cases all entries are the same shared V_joint; the list form supports heterogeneous bases if needed.Per-user feature difference tensors, shape
[M_i, F] per user. Must have the same length as W_list and V_list.Return value
Per-user accuracy values, same length as
test_features. Each value is evaluate_model(test_features[i], V_list[i], W_list[i]).Example
sample_shots()
Randomly sub-samples a fixed number of preference pairs from each user’s tensor. Used to construct few-shot datasets for controlled evaluation of sample-efficiency experiments.Parameters
Full per-user preference tensors. List of
N_unseen tensors, each shape [M_i, F]. Each user must have at least shots rows; no bounds checking is performed.Number of preference pairs to sample per user. Sampling is without replacement via a random permutation of row indices.
Return value
List of
N_unseen tensors, each shape [shots, F], containing randomly selected rows from the corresponding input tensor.