Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/LoRe/llms.txt

Use this file to discover all available pages before exploring further.

Before training, each dataset must be preprocessed into embedding files that train_basis.py can load directly. The prepare.py script in each dataset subdirectory handles this step: it downloads or reads raw preference data, tokenizes every prompt–response pair using Skywork/Skywork-Reward-Llama-3.1-8B-v0.2, extracts the final hidden-state vector from the last token position, and writes the resulting embeddings to disk. This only needs to run once — all downstream training scripts read from the cached output files.
All three prepare.py scripts load Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 onto a GPU. Expect ~16 GB VRAM usage. The model is loaded with torch_dtype=torch.bfloat16 and attn_implementation="flash_attention_2".

How the model is loaded

Every prepare.py uses the same loading pattern:
import torch
from transformers import AutoModel, AutoTokenizer

device     = "cuda:0"
model_name = "Skywork/Skywork-Reward-Llama-3.1-8B-v0.2"

rm = AutoModel.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map=device,
    attn_implementation="flash_attention_2",
    num_labels=1,
)
rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
The last hidden state of the last token is extracted without gradients:
with torch.no_grad():
    output = rm(conv_tokenized)
    embeddings.append(output.last_hidden_state[0][-1].cpu())

Dataset-specific instructions

Input data

The RedditTLDR prepare.py reads the openai/summarize_from_feedback dataset (or a local equivalent) and pairs each Reddit post with its human-preferred and human-rejected summary.Each preference pair is structured as a conversation:
conv = [
    {"role": "user",      "content": post_text},
    {"role": "assistant", "content": summary_text},
]

What prepare.py does

For every winning and losing summary in the training and validation splits, the script tokenizes the conversation with apply_chat_template and extracts the last hidden-state vector. Results are grouped by worker ID (annotator) and pickled:
tldr_embeddings_train.pkl
tldr_embeddings_val.pkl
Each pickle is a dictionary keyed by worker_id. Each value is a list of dicts with an "embeddings" key containing "winning" and "losing" embedding arrays.

Run command

cd LoRe/RedditTLDR
python prepare.py
prepare.py only needs to run once per dataset. The output files are read directly by train_basis.py and vary_fewshot.py on every subsequent run. Re-running prepare.py will overwrite the cached files.

Build docs developers (and LLMs) love