Before training a reward basis, raw preference data must be converted into feature difference tensors — vectors of the formDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt
Use this file to discover all available pages before exploring further.
embedding(chosen) - embedding(rejected). This page documents LoRe’s data preparation helpers: simulation utilities for synthetic populations built on multi-reward datasets (PersonalLLM), sparse sampling for data-limited regimes, and PRISM-specific loaders that parse embedding dictionaries into the list-of-tensors format expected by all training functions.
simulate_user()
Generates preference feature differences for a single synthetic user given their reward weight vector.w · scores, then returns the difference between the highest-scoring (chosen) and lowest-scoring (rejected) response embeddings.
Parameters
Reward scores for all responses to a single prompt, shape
[num_responses, num_reward_models]. Typically a row of the 3D reward tensor indexed by prompt.Embedding tensors for all responses per prompt.
features[i] is a list (or indexable) of num_responses embedding tensors, each shape [F].User weight vector over reward models, shape
[num_reward_models]. Typically drawn from a Dirichlet distribution via generate_popupulation().Return value
Stack of chosen-minus-rejected embedding differences, shape
[num_prompts, F].simulate_population()
Callssimulate_user() for every user in W and stacks the results into a 3D tensor.
Parameters
3D array of reward scores, shape
[num_prompts, num_responses, num_reward_models].Nested list of embeddings.
features[prompt_idx][response_idx] is a tensor of shape [F].Population weight matrix, shape
[N, num_reward_models]. Each row is one user’s simplex vector over reward models.Return value
Stacked feature differences, shape
[N, num_prompts, F]. Pass directly to create_sparse_tensor() to produce the list format expected by training functions.Full PersonalLLM workflow
generate_popupulation()
Samples a population of user weight vectors from a Dirichlet distribution.The function name contains a typo in the source: two p’s (
generate_popupulation). Import and call it with the double-p spelling to match the source.Parameters
Dirichlet concentration parameter array, shape
[num_reward_models]. Smaller values (e.g., 0.1) produce more peaked / specialized users; larger values (e.g., 1.0) produce more uniform users. In PersonalLLM experiments, alpha = 0.1 * np.ones(10) is used.Number of users to sample.
Return value
Shape
[N, len(alpha)]. Each row is a probability vector over reward models (sums to 1).create_sparse_tensor()
Randomly sub-samples a fraction of prompts from each user’s feature matrix, simulating limited interaction data.Parameters
Full feature difference array, shape
[N, M, F] where N is users, M is prompts, F is feature dimension. Typically the output of simulate_population().Fraction of prompts to retain per user, in
(0, 1]. The number of sampled rows per user is floor(sample_percentage * M). Sampling is without replacement.Return value
List of
N tensors, each shape [floor(sample_percentage * M), F], moved to device. Row order within each user is random.Common sampling fractions in LoRe experiments
| Split | sample_percentage | Interpretation |
|---|---|---|
| Train (seen users) | 0.005 | ~0.5% of prompts — highly limited |
| Few-shot (unseen users) | 0.001 | ~0.1% of prompts — minimal signal |
| Test (generalization) | 1.0 | All prompts |
create_dataset_prism()
Converts a PRISM embedding dictionary into the list-of-tensors format expected by LoRe training functions.The current PRISM
train_basis.py uses group_embeddings_by_user() (defined in that file) rather than create_dataset_prism(). create_dataset_prism() is available in utils.py for use with an alternative embedding format: a nested dict {user_id: {dialog_id: {"chosen": [...], "rejected": [...]}}}.Parameters
Nested dictionary with structure
{user_id: {dialog_id: {"chosen": [...], "rejected": [...]}}}. The innermost lists hold embedding tensors indexed by response position.Return value
List of
N_users tensors, each shape [total_pairs_for_user, F], on device. Each row is chosen[i] - rejected[i] for one preference pair across all dialogs of that user.create_dataset_prism_shots()
Same ascreate_dataset_prism() but randomly samples a fixed number of dialogs per user instead of using all available dialogs.
Parameters
Same nested dictionary format as
create_dataset_prism().Number of dialogs to randomly sample per user (without replacement from that user’s dialog list). Each selected dialog contributes all of its preference pairs.
Return value
List of
N_users tensors on device. Each tensor contains the feature differences from the sampled dialogs only. The number of rows varies by user depending on how many preference pairs fall within the sampled dialogs.shots here refers to the number of dialogs sampled, not individual preference pairs. If a dialog contains multiple (chosen, rejected) pairs, all of them are included. Use sample_shots() instead if you need to control the exact number of preference pairs per user.Like
create_dataset_prism(), this function expects the nested {user_id: {dialog_id: ...}} dict format. The current PRISM pipeline stores embeddings as a flat list of dataset entries; use group_embeddings_by_user() from PRISM/train_basis.py to work with that format.