Few-shot personalization for new users

One of LoRe’s central design goals is efficient adaptation to users the model has never seen. Once the shared basis V is trained, a new user only needs to provide a few preference pairs for LoRe to learn their personal mixture weights w — K parameters instead of a full reward model.

Two-phase approach

Learn the shared basis V

Train V and W jointly on all seen users’ preference data using solve_regularized_simplex. This captures the K most important directions of reward variation across the population. V is saved after training and reused for all future users.

Adapt new users with few-shot examples

Fix V completely. For each new user, collect a small set of preference pairs (the “shots”), then optimize only the K-dimensional weight vector w using PersonalizeBatch. Because V is fixed, this optimization has K free parameters per user and converges quickly.

Why fixing V works

The key insight is that if V’s K columns span the meaningful axes of human reward variation, any new user’s preferences can be expressed as a mixture over those axes. The new user does not need to teach the model new reward directions — only their specific combination. This is analogous to how a new user of a recommendation system does not need to create new item categories, only rate a few items to reveal their taste profile.

There is a direct tradeoff between K (rank) and adaptation speed. Higher K gives V more expressive capacity to represent diverse users, but each new user needs more preference examples to reliably estimate their K-dimensional w. In practice, K should be set to match the diversity of your user population, not maximized blindly.

`PersonalizeBatch`: fixing V and optimizing w

PersonalizeBatch holds a ParameterList of per-user weight vectors and uses a single shared Adam optimizer:

class PersonalizeBatch(nn.Module):
    def __init__(self, num_classes, num_features, num_basis_vectors,
                 num_iterations, learning_rate):
        super(PersonalizeBatch, self).__init__()
        self.num_classes = num_classes
        self.num_basis_vectors = num_basis_vectors
        self.num_iterations = num_iterations
        self.learning_rate = learning_rate

        # One weight vector per user; V is NOT a parameter here
        self.w = nn.ParameterList([
            nn.Parameter(torch.randn(num_basis_vectors))
            for _ in range(num_classes)
        ])
        self.optimizer = optim.Adam(self.parameters(), lr=learning_rate)

Forward pass

V is passed in as a fixed tensor argument, not as a module parameter, so it is never differentiated:

def forward(self, X, V):
    nll = 0
    for i, x in enumerate(X):
        V_w = V @ F.softmax(self.w[i])      # [features] — user-specific direction
        logits = x @ V_w / 100.0
        log_likelihood = torch.log(torch.sigmoid(logits))
        nll += (-log_likelihood.sum()) / len(x)
    return nll

Training loop

def train(self, X, V):
    for j in range(self.num_iterations):
        self.optimizer.zero_grad()
        loss = self.forward(X, V)
        loss.backward()
        self.optimizer.step()
    return [F.softmax(self.w[i]).detach() for i in range(len(X))]

train returns a list of simplex-constrained weight vectors — one per user — ready for evaluation with eval_multiple.

`learn_multiple_few_shot`: the entry point

learn_multiple_few_shot wraps PersonalizeBatch and handles all users simultaneously:

def learn_multiple_few_shot(train_features, V, num_iterations=1000, learning_rate=0.01):
    N = len(train_features)
    num_features = train_features[0][0].shape[0]
    fitw = PersonalizeBatch(
        N, num_features, V.shape[1], num_iterations, learning_rate
    ).to(device)
    W = fitw.train(train_features, V)
    return W

train_features is a list of tensors, one per user, where each tensor contains that user’s few-shot preference differences (chosen − rejected embeddings). V is the frozen basis from phase 1.

`sample_shots`: controlling how many examples each user provides

Before calling learn_multiple_few_shot, the sample_shots utility randomly subsamples each user’s training data to a fixed number of preference pairs:

def sample_shots(train_features_unseen, shots):
    """
    Sample 'shots' number of tensors from each tensor in train_features_unseen.
    Args:
        train_features_unseen (list): A list of tensors.
        shots (int): The number of samples to take from each tensor.
    Returns:
        list: A list of sampled tensors.
    """
    sampled_features = [
        tensor[torch.randperm(tensor.size(0))[:shots]]
        for tensor in train_features_unseen
    ]
    return sampled_features

Each call to sample_shots produces a different random subsample, which is why the vary_fewshot.py scripts run multiple independent trials and report mean and standard deviation of accuracy.

Evaluating accuracy vs. number of shots

The run_few_shot_vary_shots function sweeps over both rank K and shot count, running multiple trials at each combination:

for shots in num_shots:
    for _ in range(trials):
        train_features_unseen_shots = sample_shots(train_features_unseen, shots)

        if K <= 1:
            W_few_shot = [torch.tensor([1.0]).to(device) for _ in range(N_unseen)]
        else:
            W_few_shot = learn_multiple_few_shot(
                train_features_unseen_shots, V_joint.detach(),
                num_iterations=500, learning_rate=0.1
            )

When K <= 1, adaptation is skipped entirely — all users are assigned a constant all-ones weight, which corresponds to the shared single-reward baseline. This is the natural lower bound for few-shot performance.

K=0 and K=1 baselines

Baseline	Behavior
`K=0`	`V` is fixed to the pretrained `V_sft`; `w = [1.0]` for every user. The model is the reference reward model with no personalization.
`K=1`	A single shared reward direction is learned; `w = [1.0]` for every user. Equivalent to Bradley-Terry with no user variation.
`K≥2`	Full LoRe: `V` has `K` columns; `w_i` is adapted per user via `PersonalizeBatch`.

Both K=0 and K=1 use the all-ones weight fallback in the run and run_regularized pipelines, making them directly comparable against LoRe variants without any code changes to the evaluation loop.

Expected behavior

Accuracy on unseen users improves along two axes:

More shots: more preference examples give a better estimate of w_i, reducing adaptation error.
Higher K: a richer basis V can represent more diverse users accurately, so a well-adapted w_i achieves higher test accuracy.

The vary_fewshot.py script in RedditTLDR/ produces the shot-vs-accuracy curves used in the paper’s figures. PRISM evaluates few-shot accuracy as part of train_basis.py rather than a separate script.

Get Started

Concepts

Datasets

Training & Evaluation

Few-shot personalization for new users

Two-phase approach

Why fixing V works

`PersonalizeBatch`: fixing V and optimizing w

Forward pass

Training loop

`learn_multiple_few_shot`: the entry point

`sample_shots`: controlling how many examples each user provides

Evaluating accuracy vs. number of shots

K=0 and K=1 baselines

Expected behavior

Build docs developers (and LLMs) love

Get Started

Concepts

Datasets

Training & Evaluation

Documentation Index

​Two-phase approach

​Why fixing V works

​PersonalizeBatch: fixing V and optimizing w

​Forward pass

​Training loop

​learn_multiple_few_shot: the entry point

​sample_shots: controlling how many examples each user provides

​Evaluating accuracy vs. number of shots

​K=0 and K=1 baselines

​Expected behavior

Build docs developers (and LLMs) love

Two-phase approach

Why fixing V works

`PersonalizeBatch`: fixing V and optimizing w

Forward pass

Training loop

`learn_multiple_few_shot`: the entry point

`sample_shots`: controlling how many examples each user provides

Evaluating accuracy vs. number of shots

K=0 and K=1 baselines

Expected behavior