Core Concepts: Metric Learning, Prototypical Networks & HKDF

Neural Vault draws on ideas from three distinct fields — neuroscience signal processing, metric-learning-based deep learning, and cryptographic key derivation — and fuses them into a single enrollment-and-verification pipeline. This page explains each building block in detail so that you can reason about design trade-offs, tune the model, and interpret benchmark results with confidence.

If you want to follow along with the source code while reading, open main.py and model.py side-by-side. All class and function names referenced below appear verbatim in those files.

Concepts

Prototypical Networks & Few-Shot Learning

A prototypical network learns an embedding function that maps raw inputs into a metric space where semantically similar examples cluster together. At inference time, classification is performed without retraining: the model computes a prototype for each class by averaging the embeddings of a small number of labeled support examples, then assigns a query to whichever prototype is closest.In Neural Vault, each of the five stimulus classes has a prototype:

def class_prototypes(embeddings, labels):
    return np.vstack([
        embeddings[labels == c].mean(axis=0)
        for c in range(N_CLASSES)
    ])

During few-shot evaluation episodes, main.py samples 4 support shots per class (n_shot_train = 4) and 4 query samples per class (n_query = 4), computes class prototypes from the support set, and classifies each query by minimum Euclidean distance to the nearest prototype:

def evaluate_fewshot(model, X_train, y_train, X_test, y_test):
    model.eval()
    with torch.no_grad():
        X_tr = torch.from_numpy(X_train).float().unsqueeze(1)
        y_tr = torch.from_numpy(y_train).long()
        X_te = torch.from_numpy(X_test).float().unsqueeze(1)
        emb_tr = model(X_tr)
        emb_te = model(X_te)
        prototypes = torch.stack([
            emb_tr[y_tr == c].mean(0) for c in range(N_CLASSES)
        ])
        dists = euclidean_dist(emb_te, prototypes)
        preds = torch.argmin(dists, dim=1).numpy()
        probs = F.softmax(-dists, dim=1).numpy()
    return preds, probs

This few-shot setup is what allows Neural Vault to authenticate users from a small number of enrollment scans — no class-specific fine-tuning required.

Triplet Metric Learning

Triplet metric learning trains the embedding model by presenting three samples at a time: an anchor (a sample from some class), a positive (another sample from the same class), and a negative (a sample from a different class). The loss function pulls the anchor-positive pair together while pushing the anchor-negative pair apart by at least a margin m:

L = max(0, d(a, p) − d(a, n) + margin)

where d is squared Euclidean distance between embedding vectors. The loss is zero when the negative is already further from the anchor than the positive by the full margin — the model only updates when the constraint is violated.Neural Vault implements this as a static method on NeuralVaultFewShot:

@staticmethod
def triplet_loss(anchor, positive, negative, margin=0.3):
    """Standard Triplet Loss for Metric Learning"""
    pos_dist = (anchor - positive).pow(2).sum(1)
    neg_dist = (anchor - negative).pow(2).sum(1)
    return F.relu(pos_dist - neg_dist + margin).mean()

Triplet mining in main.py constructs batches by sampling a random anchor index, then selecting a positive from the same class (excluding the anchor itself) and a negative from a randomly chosen different class:

for idx in anchor_idx:
    cls = labels_np[idx]
    same_class = class_indices[cls]
    pos_candidates = same_class[same_class != idx]
    pos_idx.append(np.random.choice(pos_candidates))

    neg_cls = np.random.choice([c for c in range(N_CLASSES) if c != cls])
    neg_idx.append(np.random.choice(class_indices[neg_cls]))

Gradient norms are clipped to 1.0 to stabilize training:

nn.utils.clip_grad_norm_(model.parameters(), 1.0)

main.py trains for 100 epochs with a batch size of 128 and the AdamW optimizer (lr=1e-4, weight_decay=1e-4). model.py trains for 150 epochs and additionally applies a CosineAnnealingLR scheduler.

The NeuralVaultFewShot Architecture

NeuralVaultFewShot is a Transformer-based encoder that maps a sequence of cortical prediction frames to a single fixed-length embedding on the unit hypersphere. Its five computational stages are:

Feature Projection — nn.Linear(input_dim, d_model=128) lifts each frame from the raw cortical feature dimension to the model’s working width.
Positional Encoding — sinusoidal encoding is added to inject temporal order information into the sequence before it enters the Transformer.
Transformer Encoder — 2 layers of TransformerEncoderLayer with 8 attention heads, feedforward width d_model × 4 = 512, 0.1 dropout, and pre-norm (norm_first=True).
Temporal Mean Pooling — h.mean(dim=1) collapses the sequence dimension, producing a single context vector per sample.
Embedding Head — LayerNorm → Linear(d_model, d_model) → GELU → Linear(d_model, latent_dim) projects the context vector to the latent space, followed by L2 normalization so all outputs lie on the unit hypersphere.

class NeuralVaultFewShot(nn.Module):
    def __init__(self, input_dim, d_model=128,
                 nhead=8, n_layers=2, latent_dim=128):
        super().__init__()
        self.proj_in   = nn.Linear(input_dim, d_model)
        self.pos_enc   = PositionalEncoding(d_model)
        enc_layer      = nn.TransformerEncoderLayer(
                            d_model=d_model, nhead=nhead,
                            dim_feedforward=d_model * 4,
                            dropout=0.1, batch_first=True, norm_first=True)
        self.transformer = nn.TransformerEncoder(enc_layer, num_layers=n_layers)

        self.embedding_head = nn.Sequential(
            nn.LayerNorm(d_model),
            nn.Linear(d_model, d_model),
            nn.GELU(),
            nn.Linear(d_model, latent_dim)
        )

    def forward(self, x):
        """x: (B, T, F) or (B, F) → Embedding (B, latent_dim)"""
        if x.dim() == 2:
            x = x.unsqueeze(1)          # treat single frame as seq length 1
        h = self.proj_in(x)
        h = self.pos_enc(h)
        h = self.transformer(h)
        h = h.mean(dim=1)               # temporal pooling
        return F.normalize(self.embedding_head(h), p=2, dim=1)

The positional encoding uses the standard sinusoidal formula to generate non-learnable position embeddings up to max_len=512 positions:

class PositionalEncoding(nn.Module):
    def __init__(self, d_model: int, max_len: int = 512):
        super().__init__()
        pe  = torch.zeros(max_len, d_model)
        pos = torch.arange(max_len).unsqueeze(1).float()
        div = torch.exp(
            torch.arange(0, d_model, 2).float() * (-np.log(10000.0) / d_model)
        )
        pe[:, 0::2] = torch.sin(pos * div)
        pe[:, 1::2] = torch.cos(pos * div)
        self.register_buffer("pe", pe.unsqueeze(0))   # (1, max_len, d_model)

    def forward(self, x):
        return x + self.pe[:, :x.size(1)]

Output shape: (B, latent_dim=128) — unit-sphere embeddings ready for cosine similarity comparison or binarization.

Key Derivation with HKDF-SHA256

Once a stable embedding has been produced, Neural Vault converts it to a 256-bit cryptographic key through a deterministic, standards-compliant derivation function. The process has three steps:

Quantization — the float32 embedding is scaled by 1000 and cast to int16, converting each dimension to a fixed-point integer. This controlled quantization ensures that small floating-point rounding differences across hardware do not alter the key, while preserving the information content of the embedding.
Byte serialization — .tobytes() produces the Input Keying Material (IKM) for HKDF.
HKDF-SHA256 — the cryptography library’s HKDF is invoked with algorithm=SHA256, length=32 (256 bits), salt=None, and a fixed info context string that domain-separates this use of HKDF from all others.

def derive_key(emb_vec: np.ndarray, key_len_bits: int = 256):
    """Derives a stable key from a normalized embedding."""
    quantised = (emb_vec * 1000).astype(np.int16)
    ikm = quantised.tobytes()

    if has_cryptography and HKDF is not None:
        kdf = HKDF(
            algorithm=hashes.SHA256(),
            length=key_len_bits // 8,
            salt=None,
            info=b"neural-vault-few-shot-v1"
        )
        key_bytes = kdf.derive(ikm)
    else:
        # Fallback: plain SHA-256 digest when cryptography is not installed
        key_bytes = hashlib.sha256(ikm).digest()

    return key_bytes, key_bytes.hex()

The info parameter b"neural-vault-few-shot-v1" serves as a domain separator: it cryptographically binds the derived key to this specific application context, preventing key reuse across different systems or protocol versions.

The cryptography package is required for HKDF. If it is not installed, derive_key falls back to a plain hashlib.sha256 digest. The fallback is functionally correct but does not provide HKDF’s extract-and-expand security guarantees. Install dependencies with pip install cryptography.

The enrollment flow in model.py demonstrates key generation end-to-end:

# Enrollment: compute prototype and derive key
with torch.no_grad():
    all_embs = model(user_seqs.to(DEVICE)).cpu().numpy()
    prototype_vec = all_embs.mean(axis=0)
    prototype_vec /= np.linalg.norm(prototype_vec)

key_bytes, key_hex = derive_key(prototype_vec)
print(f"Derived key (256-bit): {key_hex}")

Equal Error Rate (EER)

The Equal Error Rate is the operating point on the ROC curve where the False Acceptance Rate (FAR) equals the False Rejection Rate (FRR). It is a single-number summary of a biometric system’s discriminative power: a perfect system has EER = 0%, while a system that guesses randomly has EER ≈ 50%.Neural Vault computes EER using Brent’s root-finding method applied to the interpolated ROC curve:

from scipy.optimize import brentq
from scipy.interpolate import interp1d
from sklearn.metrics import roc_curve

def compute_eer(genuine, impostor):
    """Calculates the Equal Error Rate using interpolation root-finding."""
    if len(genuine) == 0 or len(impostor) == 0:
        return 50.0
    fpr, tpr, _ = compute_roc_from_scores(genuine, impostor)
    eer = brentq(
        lambda x: 1. - x - interp1d(fpr, tpr)(x), 0., 1.
    )
    return eer * 100

The root of 1 − x − TPR(x) = 0 is exactly the point where FPR = FNR = EER.Benchmark results (lower is better):

Method	EER
Neural (binarized embeddings)	0.75%
NeuralVault (cosine similarity)	0.94%
BioHashing	15.09%
SHA-256	48.4%
HMAC	52.6%

SHA-256 and HMAC operate near 50% because a single-bit change in the input flips ~50% of output bits (avalanche effect), making genuine and impostor Hamming distances statistically identical.

d-prime (d')

d-prime (d′) is a measure from signal detection theory that quantifies how well two distributions — genuine scores and impostor scores — can be separated. It is defined as:

d' = |μ_impostor − μ_genuine| / sqrt(0.5 × (σ²_genuine + σ²_impostor))

Larger d′ means the distributions overlap less and the system can more reliably distinguish genuine users from impostors. Values above 3 indicate excellent separation; values below 1 indicate poor separation.

def compute_d_prime(genuine, impostor):
    """Calculates the mathematical d-prime separability metric."""
    mu_g,  mu_i  = np.mean(genuine),  np.mean(impostor)
    var_g, var_i = np.var(genuine),    np.var(impostor)
    return abs(mu_i - mu_g) / np.sqrt(0.5 * (var_g + var_i) + 1e-9)

The + 1e-9 epsilon prevents division by zero when distributions are perfectly sharp.Benchmark results (higher is better):

Method	d-prime
Neural	5.95
NeuralVault	4.84
BioHashing	2.21
HMAC	0.25
SHA-256	0.14

A d′ of 5.95 means the genuine and impostor score distributions are separated by nearly 6 pooled standard deviations — vanishingly small overlap. The SHA-256 d′ of 0.14 reflects distributions that are almost entirely coincident.

Cosine Similarity vs. Hamming Distance

Neural Vault uses two different distance metrics depending on the stage of the pipeline:Cosine similarity is used for prototype-based verification in the NeuralVault method. Because the Transformer outputs L2-normalized vectors (unit hypersphere), the dot product of any two embeddings equals their cosine similarity directly:

def verify_similarity(embeddings, prototypes, labels):
    genuine, impostor = [], []
    for c in range(N_CLASSES):
        ref = np.atleast_2d(prototypes[c])
        genuine.extend(
            cdist(ref, embeddings[labels == c],  metric='cosine').flatten()
        )
        impostor.extend(
            cdist(ref, embeddings[labels != c], metric='cosine').flatten()
        )
    return np.array(genuine), np.array(impostor)

In model.py, the inline verification function exploits the unit-norm property directly:

def verify(raw_data_np: np.ndarray) -> float:
    """Returns cosine similarity to the user prototype."""
    emb = get_embedding(raw_data_np)
    # L2-normalized vectors → dot product = cosine similarity
    similarity = np.dot(emb, prototype_vec)
    return float(similarity)

Hamming distance is used by the traditional methods (SHA-256, HMAC, BioHashing) and the binarized Neural baseline. These methods produce fixed-length binary bit-strings, and Hamming distance counts the fraction of differing bits:

methods_fns = {
    "SHA256":     lambda f: np.frombuffer(hashlib.sha256(f.tobytes()).digest(), dtype=np.uint8),
    "HMAC":       lambda f: np.frombuffer(hmac.new(b"secret_vault", f.tobytes(), hashlib.sha256).digest(), dtype=np.uint8),
    "BioHashing": lambda f: binarize(transformer.transform(f.reshape(1, -1)), threshold=0.0).flatten().astype(np.uint8)
}
# ...
genuine_dists.extend(cdist(ref_key_matrix, gen_keys, metric='hamming').flatten())
impostor_dists.extend(cdist(ref_key_matrix, imp_keys, metric='hamming').flatten())

For SHA-256 and HMAC, the avalanche effect means that even highly similar input vectors produce binary keys with ~50% Hamming distance — identical to the expected distance between two random 256-bit strings. This is why their EERs hover near 50%.

TRIBEv2 Predictions & the Input Feature Space

TRIBEv2 (Transformer-based Representations for Individual Brain Encoding v2) is a brain-encoding model that predicts cortical surface activity from video stimuli. data.py drives it to produce the training data:

from tribev2 import TribeModel

model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

for i in range(1, 6):
    filename = f"v{i} (online-video-cutter.com).mp4"
    df = model.get_events_dataframe(video_path=filename)
    preds, segments = model.predict(events=df)
    # preds.shape: (n_timesteps, n_vertices)
    for t in range(n_timesteps):
        for v in range(n_vertices):
            all_rows.append({
                "video": filename,
                "timestep": t,
                "prediction": preds[t, v]
            })

The output 5classpreds.csv contains one row per (video, timestep, vertex) triple. main.py pivots this into a (n_samples, n_vertices) matrix where:

Rows correspond to unique (video, timestep) pairs — one brain-state snapshot per row.
Columns correspond to individual cortical vertices.
Values are float32 predicted BOLD-like activations at each vertex.

The full feature vector has 20,484 dimensions — the complete fsaverage5 cortical surface:

Hemisphere	Vertices
Left	10,242
Right	10,242
Total	20,484

Before entering the Transformer, features are standardized with sklearn.preprocessing.StandardScaler (zero mean, unit variance per vertex) to remove baseline differences across stimuli and scanning sessions:

X_scaled = StandardScaler().fit_transform(X)

NaN values, positive infinities, and negative infinities (which can appear in TRIBEv2 predictions for occluded vertices) are replaced with 0, 1, and −1 respectively:

X = np.nan_to_num(X, nan=0.0, posinf=1.0, neginf=-1.0)

Overview

Getting Started

Pipeline

Benchmarking

Reference

Core Concepts: Metric Learning, Prototypical Networks & HKDF

Concepts

Build docs developers (and LLMs) love

Overview

Getting Started

Pipeline

Benchmarking

Reference

Documentation Index

​Concepts

Build docs developers (and LLMs) love

Concepts