Neural Vault draws on ideas from three distinct fields — neuroscience signal processing, metric-learning-based deep learning, and cryptographic key derivation — and fuses them into a single enrollment-and-verification pipeline. This page explains each building block in detail so that you can reason about design trade-offs, tune the model, and interpret benchmark results with confidence.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Skieriya/fMRI-key-generation-with-TRIBEv2/llms.txt
Use this file to discover all available pages before exploring further.
Concepts
Prototypical Networks & Few-Shot Learning
Prototypical Networks & Few-Shot Learning
A prototypical network learns an embedding function that maps raw inputs into a metric space where semantically similar examples cluster together. At inference time, classification is performed without retraining: the model computes a prototype for each class by averaging the embeddings of a small number of labeled support examples, then assigns a query to whichever prototype is closest.In Neural Vault, each of the five stimulus classes has a prototype:During few-shot evaluation episodes, This few-shot setup is what allows Neural Vault to authenticate users from a small number of enrollment scans — no class-specific fine-tuning required.
main.py samples 4 support shots per class (n_shot_train = 4) and 4 query samples per class (n_query = 4), computes class prototypes from the support set, and classifies each query by minimum Euclidean distance to the nearest prototype:Triplet Metric Learning
Triplet Metric Learning
Triplet metric learning trains the embedding model by presenting three samples at a time: an anchor (a sample from some class), a positive (another sample from the same class), and a negative (a sample from a different class). The loss function pulls the anchor-positive pair together while pushing the anchor-negative pair apart by at least a margin where Triplet mining in Gradient norms are clipped to 1.0 to stabilize training:
m:d is squared Euclidean distance between embedding vectors. The loss is zero when the negative is already further from the anchor than the positive by the full margin — the model only updates when the constraint is violated.Neural Vault implements this as a static method on NeuralVaultFewShot:main.py constructs batches by sampling a random anchor index, then selecting a positive from the same class (excluding the anchor itself) and a negative from a randomly chosen different class:main.py trains for 100 epochs with a batch size of 128 and the AdamW optimizer (lr=1e-4, weight_decay=1e-4). model.py trains for 150 epochs and additionally applies a CosineAnnealingLR scheduler.The NeuralVaultFewShot Architecture
The NeuralVaultFewShot Architecture
NeuralVaultFewShot is a Transformer-based encoder that maps a sequence of cortical prediction frames to a single fixed-length embedding on the unit hypersphere. Its five computational stages are:- Feature Projection —
nn.Linear(input_dim, d_model=128)lifts each frame from the raw cortical feature dimension to the model’s working width. - Positional Encoding — sinusoidal encoding is added to inject temporal order information into the sequence before it enters the Transformer.
- Transformer Encoder — 2 layers of
TransformerEncoderLayerwith 8 attention heads, feedforward widthd_model × 4 = 512, 0.1 dropout, and pre-norm (norm_first=True). - Temporal Mean Pooling —
h.mean(dim=1)collapses the sequence dimension, producing a single context vector per sample. - Embedding Head —
LayerNorm → Linear(d_model, d_model) → GELU → Linear(d_model, latent_dim)projects the context vector to the latent space, followed by L2 normalization so all outputs lie on the unit hypersphere.
max_len=512 positions:(B, latent_dim=128) — unit-sphere embeddings ready for cosine similarity comparison or binarization.Key Derivation with HKDF-SHA256
Key Derivation with HKDF-SHA256
Once a stable embedding has been produced, Neural Vault converts it to a 256-bit cryptographic key through a deterministic, standards-compliant derivation function. The process has three steps:The The enrollment flow in
- Quantization — the float32 embedding is scaled by 1000 and cast to
int16, converting each dimension to a fixed-point integer. This controlled quantization ensures that small floating-point rounding differences across hardware do not alter the key, while preserving the information content of the embedding. - Byte serialization —
.tobytes()produces the Input Keying Material (IKM) for HKDF. - HKDF-SHA256 — the
cryptographylibrary’s HKDF is invoked withalgorithm=SHA256,length=32(256 bits),salt=None, and a fixedinfocontext string that domain-separates this use of HKDF from all others.
info parameter b"neural-vault-few-shot-v1" serves as a domain separator: it cryptographically binds the derived key to this specific application context, preventing key reuse across different systems or protocol versions.The
cryptography package is required for HKDF. If it is not installed, derive_key falls back to a plain hashlib.sha256 digest. The fallback is functionally correct but does not provide HKDF’s extract-and-expand security guarantees. Install dependencies with pip install cryptography.model.py demonstrates key generation end-to-end:Equal Error Rate (EER)
Equal Error Rate (EER)
The Equal Error Rate is the operating point on the ROC curve where the False Acceptance Rate (FAR) equals the False Rejection Rate (FRR). It is a single-number summary of a biometric system’s discriminative power: a perfect system has EER = 0%, while a system that guesses randomly has EER ≈ 50%.Neural Vault computes EER using Brent’s root-finding method applied to the interpolated ROC curve:The root of
SHA-256 and HMAC operate near 50% because a single-bit change in the input flips ~50% of output bits (avalanche effect), making genuine and impostor Hamming distances statistically identical.
1 − x − TPR(x) = 0 is exactly the point where FPR = FNR = EER.Benchmark results (lower is better):| Method | EER |
|---|---|
| Neural (binarized embeddings) | 0.75% |
| NeuralVault (cosine similarity) | 0.94% |
| BioHashing | 15.09% |
| SHA-256 | 48.4% |
| HMAC | 52.6% |
d-prime (d')
d-prime (d')
d-prime (d′) is a measure from signal detection theory that quantifies how well two distributions — genuine scores and impostor scores — can be separated. It is defined as:Larger d′ means the distributions overlap less and the system can more reliably distinguish genuine users from impostors. Values above 3 indicate excellent separation; values below 1 indicate poor separation.The
A d′ of 5.95 means the genuine and impostor score distributions are separated by nearly 6 pooled standard deviations — vanishingly small overlap. The SHA-256 d′ of 0.14 reflects distributions that are almost entirely coincident.
+ 1e-9 epsilon prevents division by zero when distributions are perfectly sharp.Benchmark results (higher is better):| Method | d-prime |
|---|---|
| Neural | 5.95 |
| NeuralVault | 4.84 |
| BioHashing | 2.21 |
| HMAC | 0.25 |
| SHA-256 | 0.14 |
Cosine Similarity vs. Hamming Distance
Cosine Similarity vs. Hamming Distance
Neural Vault uses two different distance metrics depending on the stage of the pipeline:Cosine similarity is used for prototype-based verification in the NeuralVault method. Because the Transformer outputs L2-normalized vectors (unit hypersphere), the dot product of any two embeddings equals their cosine similarity directly:In Hamming distance is used by the traditional methods (SHA-256, HMAC, BioHashing) and the binarized Neural baseline. These methods produce fixed-length binary bit-strings, and Hamming distance counts the fraction of differing bits:For SHA-256 and HMAC, the avalanche effect means that even highly similar input vectors produce binary keys with ~50% Hamming distance — identical to the expected distance between two random 256-bit strings. This is why their EERs hover near 50%.
model.py, the inline verification function exploits the unit-norm property directly:TRIBEv2 Predictions & the Input Feature Space
TRIBEv2 Predictions & the Input Feature Space
TRIBEv2 (Transformer-based Representations for Individual Brain Encoding v2) is a brain-encoding model that predicts cortical surface activity from video stimuli. The output
Before entering the Transformer, features are standardized with NaN values, positive infinities, and negative infinities (which can appear in TRIBEv2 predictions for occluded vertices) are replaced with 0, 1, and −1 respectively:
data.py drives it to produce the training data:5classpreds.csv contains one row per (video, timestep, vertex) triple. main.py pivots this into a (n_samples, n_vertices) matrix where:- Rows correspond to unique
(video, timestep)pairs — one brain-state snapshot per row. - Columns correspond to individual cortical vertices.
- Values are float32 predicted BOLD-like activations at each vertex.
| Hemisphere | Vertices |
|---|---|
| Left | 10,242 |
| Right | 10,242 |
| Total | 20,484 |
sklearn.preprocessing.StandardScaler (zero mean, unit variance per vertex) to remove baseline differences across stimuli and scanning sessions: