Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Skieriya/fMRI-key-generation-with-TRIBEv2/llms.txt

Use this file to discover all available pages before exploring further.

Neural Vault’s entire pipeline is built on top of cortical surface predictions produced by TRIBEv2, Meta’s brain-encoding model (facebook/tribev2). TRIBEv2 takes a video stimulus, segments it into neural events, and returns a (T, 20484) float32 matrix where each row is one predicted fMRI timestep and each column is one vertex on the fsaverage5 cortical surface. The data.py script calls the TRIBEv2 API for each of your five videos, flattens the predictions into long-form rows, and writes the concatenated result to 5classpreds.csv. That single file is the only data input consumed by main.py and model.py.

Generating the CSV

The data.py script downloads the TRIBEv2 checkpoint from Hugging Face, iterates over five MP4 video files, and writes every vertex activation at every timestep to a CSV:
from huggingface_hub import snapshot_download
import os

snapshot_download(repo_id="meta-llama/Llama-3.2-3B", repo_type="model", token=os.environ["HF_TOKEN"])

import pandas as pd
from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

all_rows = []

# Loop over 5 videos
for i in range(1, 6):
    filename = f"v{i} (online-video-cutter.com).mp4"
    print(f"Processing: {filename}")

    if not os.path.exists(filename):
        print(f"Error: Video file '{filename}' not found.")
        continue

    df = model.get_events_dataframe(video_path=filename)

    preds, segments = model.predict(events=df)

    n_timesteps, n_vertices = preds.shape

    for t in range(n_timesteps):
        for v in range(n_vertices):
            all_rows.append({
                "video": filename,
                "timestep": t,
                "prediction": preds[t, v]
            })

dataset = pd.DataFrame(all_rows)

dataset.to_csv("5classpreds.csv", index=False)

print("Saved to 5classpreds.csv")
What it does, step by step:
  1. TribeModel.from_pretrained("facebook/tribev2") — loads the brain-encoding checkpoint from the Hugging Face Hub into ./cache.
  2. model.get_events_dataframe(video_path=filename) — segments the video into stimulus events and returns a timing DataFrame that TRIBEv2 uses to align fMRI predictions.
  3. model.predict(events=df) — runs the encoder and returns preds with shape (n_timesteps, n_vertices) where n_vertices = 20484.
  4. The triple-nested loop unrolls preds into one row per (video, timestep, vertex) combination and appends it to all_rows.
  5. The final DataFrame is written to 5classpreds.csv.
The five video files must be named exactly v1 (online-video-cutter.com).mp4 through v5 (online-video-cutter.com).mp4 and must exist in the working directory before you run the script. Missing files are skipped with a printed warning rather than raising an exception — ensure all five are present to produce valid class labels downstream.

What the pipeline expects

5classpreds.csv must contain exactly three columns:
ColumnTypeDescription
videostringThe video filename. Used as the class label — each unique filename becomes one class (0–4).
timestepintZero-indexed temporal frame number within that video.
predictionfloat32The predicted cortical vertex activation at this timestep, from TRIBEv2.
Dataset shape: 5 videos × N timesteps × 20,484 vertices rows. For a video with 50 timesteps the file contains 5 × 50 × 20,484 = 5,121,000 rows. There is no header row requirement other than these exact column names. Brain surface layout: The 20,484 vertices follow the fsaverage5 parcellation — 10,242 vertices covering the left hemisphere and 10,242 covering the right hemisphere. Neural Vault treats all 20,484 values as a flat feature vector; no hemisphere-specific processing is applied.
main.py raises the following error and exits with code 1 if 5classpreds.csv is not found in the working directory:
FileNotFoundError: 5classpreds.csv is required and cannot be autogenerated.
The file must be generated with data.py against real TRIBEv2 predictions before running main.py.
model.py handles the missing file gracefully. If 5classpreds.csv is not found it prints a warning and substitutes a synthetic stand-in:
raw_np = np.random.randn(20, 20484).astype(np.float32)
This lets you run the full enrollment, training, and key-derivation code path without real fMRI data. Benchmark numbers from synthetic data carry no biometric validity.

Preprocessing steps

Once 5classpreds.csv is on disk, the data passes through three sequential transformations before reaching the Transformer encoder.

Step 1 — Pivot and scale (load_and_prepare_data in main.py)

main.py loads the long-form CSV and pivots it into a (N_samples, N_features) matrix:
def load_and_prepare_data():
    if not os.path.exists("5classpreds.csv"):
        raise FileNotFoundError("5classpreds.csv is required and cannot be autogenerated.")

    df = pd.read_csv("5classpreds.csv")
    label_map = {v: i for i, v in enumerate(sorted(df['video'].unique()))}
    df['label'] = df['video'].map(label_map)
    df['feat_idx'] = df.groupby(['video', 'timestep']).cumcount()

    pivot_df = df.pivot_table(
        index=['video', 'timestep'],
        columns='feat_idx',
        values='prediction',
        aggfunc='first'
    ).reset_index()

    X = pivot_df.drop(columns=['video', 'timestep']).values.astype(np.float32)
    y = pivot_df['video'].map(label_map).values.astype(np.int64)

    X = np.nan_to_num(X, nan=0.0, posinf=1.0, neginf=-1.0)
    X_scaled = StandardScaler().fit_transform(X)
    return X_scaled, y
  • pd.pivot_table reshapes the long-form rows into a wide matrix where each row is one (video, timestep) pair and each column is one of the 20,484 vertex feature indices.
  • label_map assigns an integer class label 0–4 to each video filename in sorted alphabetical order.
  • np.nan_to_num replaces any NaN, +inf, or −inf values left by TRIBEv2 edge cases.
  • StandardScaler().fit_transform(X) applies z-score normalization across the entire dataset. This differs from model.py’s per-feature normalization — see Step 2 below.

Step 2 — Per-feature z-score normalization (model.py)

model.py applies a per-feature normalization fit exclusively on the user’s own data, never on impostors:
feat_mean = raw_np.mean(axis=0, keepdims=True)
feat_std  = raw_np.std(axis=0,  keepdims=True) + 1e-8
norm_np   = (raw_np - feat_mean) / feat_std          # shape (N, 20484)
The formula applied is: xnorm=xμσ+108x_{\text{norm}} = \frac{x - \mu}{\sigma + 10^{-8}} The + 1e-8 epsilon prevents division by zero on constant-valued vertices. The mean and standard deviation are stored as feat_mean and feat_std and reused verbatim at inference time inside get_embedding(), ensuring that new authentication samples are normalized relative to the enrollment distribution.

Step 3 — Sequence chunking (build_sequences in model.py)

The normalized (N, 20484) matrix is chunked into non-overlapping temporal windows before being fed to the Transformer encoder:
def build_sequences(data: np.ndarray, seq_len: int = SEQ_CHUNK) -> torch.Tensor:
    n = (len(data) // seq_len) * seq_len
    data = data[:n]
    seqs = data.reshape(-1, seq_len, data.shape[-1])
    return torch.from_numpy(seqs)
With SEQ_CHUNK = 5, every five consecutive timestep rows are stacked into a single sequence tensor. The output shape is (B, 5, 20484), where B = ⌊N / 5⌋. Any trailing rows that do not fill a complete window are silently dropped. This tensor is passed directly to the Transformer encoder’s forward() method, which projects the 20,484-dimensional frame vectors to the model width D_MODEL before applying positional encoding and multi-head self-attention:
def forward(self, x):
    """x: (B, T, F) → Embedding (B, latent_dim)"""
    h = self.proj_in(x)       # (B, T, D_MODEL)
    h = self.pos_enc(h)        # adds sinusoidal position
    h = self.transformer(h)    # self-attention over T=5 frames
    h = h.mean(dim=1)          # temporal pooling → (B, D_MODEL)
    return F.normalize(self.embedding_head(h), p=2, dim=1)  # L2-normalized
The final L2-normalized embedding vector is what gets passed to derive_key() for HKDF-SHA256 key derivation.

Build docs developers (and LLMs) love