Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Skieriya/fMRI-key-generation-with-TRIBEv2/llms.txt

Use this file to discover all available pages before exploring further.

Neural Vault exposes a flat Python API spread across two modules: main.py (pipeline orchestration, benchmark utilities, and training helpers) and model.py (the enrolled-user verification interface). Every public function is documented below in the order you would encounter it when running a full benchmark or integrating Neural Vault into an application.

Classes

PositionalEncoding

Adds learnable-free sinusoidal positional encodings to a sequence tensor before it is fed into the Transformer encoder stack. Frequencies are computed with the standard log-space division formula so that each dimension encodes a different temporal wavelength.
class PositionalEncoding(d_model: int, max_len: int = 512)
d_model
int
required
Width of the Transformer model dimension. Must match the d_model argument passed to NeuralVaultFewShot.
max_len
int
default:"512"
Maximum sequence length the buffer is pre-computed for. Sequences shorter than max_len are sliced; longer sequences will raise an index error at runtime.

forward(x)

x
torch.Tensor
required
Input tensor of shape (B, T, D) — batch × time-steps × model dimension. The encoding is added in-place via broadcasting; no extra allocation occurs.
Returns torch.Tensor — same shape (B, T, D) with positional signal fused into each time-step.

NeuralVaultFewShot

The core Prototypical Transformer Network. The architecture projects raw fMRI feature vectors into d_model space, applies sinusoidal positional encoding, passes the sequence through n_layers Pre-LN Transformer encoder layers (with dim_feedforward = d_model * 4 and 10% dropout), temporally pools the output with a mean-reduce, and projects the result through a 2-layer GELU head down to latent_dim. The final embedding is L2-normalised onto the unit hypersphere so that dot products equal cosine similarities.
class NeuralVaultFewShot(
    input_dim: int,
    d_model: int = 128,
    nhead: int = 8,
    n_layers: int = 2,
    latent_dim: int = 128,
)
input_dim
int
required
Number of input features per time-step — equals the number of TRIBEv2 cortical vertex predictions in the dataset (typically 20484 for a full cortical surface or the number of columns in 5classpreds.csv).
d_model
int
default:"128"
Internal Transformer model width. All linear projections, attention layers, and feed-forward sublayers operate in this dimensionality. The feed-forward sublayer width is automatically set to d_model * 4.
nhead
int
default:"8"
Number of parallel attention heads. d_model must be evenly divisible by nhead.
n_layers
int
default:"2"
Depth of the Transformer encoder stack. Each layer applies multi-head self-attention followed by a feed-forward sublayer, both with Pre-LN normalisation.
latent_dim
int
default:"128"
Dimensionality of the output embedding. All downstream key derivation, prototype comparison, and few-shot classification operate in this space.

forward(x)

x
torch.Tensor
required
Input tensor. Accepts either (B, T, F) (explicit sequence of T frames, each with F features) or (B, F) (single-frame, which is automatically unsqueezed to (B, 1, F) before processing).
Returns torch.Tensor — shape (B, latent_dim). Every row lies on the unit hypersphere (L2-norm = 1).
import torch
from main import NeuralVaultFewShot

model = NeuralVaultFewShot(input_dim=512, d_model=128, nhead=8, n_layers=2, latent_dim=128)
x = torch.randn(16, 5, 512)   # batch=16, seq_len=5, features=512
embeddings = model(x)         # → (16, 128), unit-norm rows

triplet_loss(anchor, positive, negative, margin=0.3)

Static method. Computes the standard triplet loss used during metric-learning training. Squared Euclidean distances are used so that gradients are smooth near zero.
L = mean(relu(||anchor - positive||² - ||anchor - negative||² + margin))
anchor
torch.Tensor
required
Shape (B, latent_dim) — reference embeddings for each triplet.
positive
torch.Tensor
required
Shape (B, latent_dim) — same-class embeddings that should be pulled closer to the anchor.
negative
torch.Tensor
required
Shape (B, latent_dim) — different-class embeddings that should be pushed away from the anchor.
margin
float
default:"0.3"
Minimum enforced distance gap between positive and negative pairs. Triplets already satisfying this margin contribute zero loss.
Returns torch.Tensor — scalar loss tensor. Back-propagate directly with .backward().
loss = NeuralVaultFewShot.triplet_loss(emb_anc, emb_pos, emb_neg, margin=0.3)
loss.backward()

Key Derivation

derive_key

Derives a stable cryptographic key from a unit-norm embedding vector. The embedding is first quantised to int16 (multiplied by 1000 and cast), then its raw bytes are used as input keying material (IKM) for HKDF-SHA256 with the fixed info string b"neural-vault-few-shot-v1". If the cryptography package is unavailable, the function falls back to raw hashlib.sha256.
def derive_key(emb_vec: np.ndarray, key_len_bits: int = 256) -> tuple[bytes, str]
emb_vec
np.ndarray
required
L2-normalised embedding vector, shape (latent_dim,). Should come from NeuralVaultFewShot.forward or get_embedding.
key_len_bits
int
default:"256"
Desired key length in bits. Must be a multiple of 8. Values other than 256 fall back to the hashlib path when cryptography is unavailable.
Returns tuple[bytes, str](key_bytes, key_hex). key_bytes is the raw binary key; key_hex is its lowercase hex representation.
from main import derive_key
import numpy as np

emb = np.random.randn(128).astype(np.float32)
emb /= np.linalg.norm(emb)

key_bytes, key_hex = derive_key(emb, key_len_bits=256)
print(key_hex)   # e.g. "3a9f2c..."  (64 hex characters)
Quantisation to int16 (scale factor 1000) means embedding differences smaller than 0.001 are invisible to the key derivation step. Ensure your embedding is genuinely stable before calling derive_key in a production enrollment flow.

Verification

These functions live in model.py and operate against a pre-enrolled prototype (prototype_vec) that is computed at module load time from 5classpreds.csv. They encapsulate the full normalisation, sequence-building, and embedding pipeline so callers only need to pass raw fMRI feature arrays.

get_embedding

Normalises raw fMRI data using the per-feature statistics fitted at enrollment time, chunks the result into non-overlapping sequences of SEQ_CHUNK frames, runs the model, and returns the mean embedding across all chunks.
def get_embedding(raw_data_np: np.ndarray) -> np.ndarray
raw_data_np
np.ndarray
required
Shape (N, N_FEAT) — raw (unnormalised) fMRI feature matrix from a new acquisition session. N must be at least 1; if fewer than SEQ_CHUNK rows are present, the array is tiled to pad the first sequence.
Returns np.ndarray — shape (latent_dim,). Mean embedding, not re-normalised after averaging; callers should check this if comparing raw dot products.

verify

Computes the cosine similarity between a new fMRI acquisition and the enrolled user prototype. Because both vectors are L2-normalised before being stored, the similarity is equivalent to a plain dot product.
def verify(raw_data_np: np.ndarray) -> float
raw_data_np
np.ndarray
required
Shape (N, N_FEAT) raw fMRI feature matrix for the authentication attempt.
Returns float — cosine similarity in the range [-1, 1]. Compare against the EER threshold (≈ 0.316 from benchmark results) to accept or reject.
from model import verify

# Load your acquisition …
score = verify(new_session_data)
if score > 0.316:
    print("Authenticated")
else:
    print("Rejected")

Data Processing

load_and_prepare_data

Reads 5classpreds.csv from the current working directory, maps the five stimulus-class video labels to integers (0–4), pivots the long-format prediction table into a (timestep × feature) matrix, replaces any NaN / ±Inf with safe defaults, and returns a StandardScaler-normalised feature matrix alongside the integer label vector.
def load_and_prepare_data() -> tuple[np.ndarray, np.ndarray]
Parameters — none. Returns tuple[np.ndarray, np.ndarray]
ComponentShapeDescription
X_scaled(N, N_FEAT)z-score standardised feature matrix
y(N,)integer class labels, dtype int64
Raises FileNotFoundError if 5classpreds.csv is not found in the current directory.
from main import load_and_prepare_data

X_scaled, y = load_and_prepare_data()
print(X_scaled.shape, y.shape)   # e.g. (120, 512) (120,)

build_sequences

Splits a normalised feature array into non-overlapping temporal windows of seq_len frames. Trailing rows that do not fill a complete window are discarded. This is the sequence builder used by model.py during both training and verification.
def build_sequences(data: np.ndarray, seq_len: int = SEQ_CHUNK) -> torch.Tensor
data
np.ndarray
required
Shape (N, N_FEAT) normalised feature array. Rows are ordered chronologically.
seq_len
int
default:"5"
Temporal window size in frames. Matches the SEQ_CHUNK constant in model.py (default 5).
Returns torch.Tensor — shape (B, seq_len, N_FEAT) where B = floor(N / seq_len).

build_sequence_tensor

Wraps a feature matrix as single-frame sequences for use with the main.py training loop. Each row becomes a (1, N_FEAT) sequence tensor rather than a chunked temporal window.
def build_sequence_tensor(X: np.ndarray) -> torch.Tensor
X
np.ndarray
required
Shape (N, N_FEAT) feature matrix, dtype float32.
Returns torch.Tensor — shape (N, 1, N_FEAT), dtype float32.

class_prototypes

Computes the mean embedding for each class, producing a prototype matrix used for few-shot nearest-centroid classification and vault-key derivation.
def class_prototypes(embeddings: np.ndarray, labels: np.ndarray) -> np.ndarray
embeddings
np.ndarray
required
Shape (N, latent_dim) — L2-normalised embeddings for all samples.
labels
np.ndarray
required
Shape (N,) — integer class labels aligned row-by-row with embeddings. Classes must be contiguous integers starting at 0 up to N_CLASSES - 1.
Returns np.ndarray — shape (N_CLASSES, latent_dim). Row c is the mean embedding of all samples with labels == c.

Evaluation

evaluate_fewshot

Runs nearest-centroid few-shot inference on a support/query split. Prototypes are computed from (X_train, y_train) and each query in X_test is assigned the class whose prototype is closest under squared Euclidean distance.
def evaluate_fewshot(
    model: NeuralVaultFewShot,
    X_train: np.ndarray,
    y_train: np.ndarray,
    X_test: np.ndarray,
    y_test: np.ndarray,
) -> tuple[np.ndarray, np.ndarray]
model
NeuralVaultFewShot
required
A trained NeuralVaultFewShot instance. The function calls model.eval() internally and wraps inference in torch.no_grad().
X_train
np.ndarray
required
Support set features, shape (n_shot * N_CLASSES, N_FEAT).
y_train
np.ndarray
required
Support set labels, shape (n_shot * N_CLASSES,).
X_test
np.ndarray
required
Query set features, shape (n_query * N_CLASSES, N_FEAT).
y_test
np.ndarray
required
Query set ground-truth labels, shape (n_query * N_CLASSES,). Used externally for metric computation — not consumed inside this function.
Returns tuple[np.ndarray, np.ndarray]
ComponentShapeDescription
preds(n_query * N_CLASSES,)Predicted class indices
probs(n_query * N_CLASSES, N_CLASSES)Softmax of negative distances, usable as class probabilities for ROC-AUC

evaluate_keygen_method

Benchmarks a key-generation function against all five stimulus classes by comparing keys derived from class prototypes against per-sample keys using Hamming distance. Produces arrays of genuine and impostor Hamming distances suitable for EER and d-prime computation.
def evaluate_keygen_method(
    X: np.ndarray,
    y: np.ndarray,
    keygen_fn: callable,
) -> tuple[np.ndarray, np.ndarray]
X
np.ndarray
required
Standardised feature matrix, shape (N, N_FEAT).
y
np.ndarray
required
Integer class labels, shape (N,).
keygen_fn
callable
required
Function mapping a feature vector (np.ndarray of shape (N_FEAT,)) to a binary key (np.ndarray of uint8 0/1 values or raw bytes). Built-in options include SHA256, HMAC-SHA256, and BioHashing.
Returns tuple[np.ndarray, np.ndarray](genuine_dists, impostor_dists). Values are Hamming distances (fractions in [0, 1]). Ideal genuine distances are near 0; ideal impostor distances are near 0.5.

verify_similarity

Computes cosine distances between per-class prototype embeddings and all sample embeddings, splitting results into genuine-match and impostor-match arrays.
def verify_similarity(
    embeddings: np.ndarray,
    prototypes: np.ndarray,
    labels: np.ndarray,
) -> tuple[np.ndarray, np.ndarray]
embeddings
np.ndarray
required
Shape (N, latent_dim).
prototypes
np.ndarray
required
Shape (N_CLASSES, latent_dim) — output of class_prototypes.
labels
np.ndarray
required
Shape (N,) — integer class labels aligned with embeddings.
Returns tuple[np.ndarray, np.ndarray](genuine, impostor). Values are cosine distances (1 - cosine_similarity), so smaller = more similar.

euclidean_dist

Computes the full pairwise squared Euclidean distance matrix between two sets of embedding vectors using broadcasting. Used internally by evaluate_fewshot for nearest-centroid assignment.
def euclidean_dist(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor
x
torch.Tensor
required
Shape (N, D) — query embeddings.
y
torch.Tensor
required
Shape (M, D) — prototype embeddings.
Returns torch.Tensor — shape (N, M) pairwise squared Euclidean distances. Raises ValueError if x.size(1) != y.size(1).

Benchmark Utilities

compute_d_prime

Computes d-prime (d′), the signal-detection-theory measure of separability between genuine and impostor score distributions.
def compute_d_prime(genuine: np.ndarray, impostor: np.ndarray) -> float
genuine
np.ndarray
required
1-D array of genuine match scores or distances.
impostor
np.ndarray
required
1-D array of impostor scores or distances.
Returns float — d-prime value. A small epsilon (1e-9) is added to the pooled variance denominator to guard against division by zero.

compute_eer

Computes the Equal Error Rate by finding the threshold where the false positive rate equals the false negative rate. Uses Brent’s root-finding method (scipy.optimize.brentq) on a linear interpolation of the ROC curve for numerical precision.
def compute_eer(genuine: np.ndarray, impostor: np.ndarray) -> float
genuine
np.ndarray
required
1-D array of genuine match scores.
impostor
np.ndarray
required
1-D array of impostor scores.
Returns float — EER as a percentage in the range [0, 100]. Returns 50.0 if either input array is empty.

compute_roc_from_scores

Constructs a combined label and score vector from genuine/impostor arrays and delegates to sklearn.metrics.roc_curve.
def compute_roc_from_scores(
    genuine: np.ndarray,
    impostor: np.ndarray,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]
genuine
np.ndarray
required
1-D array of genuine match scores (labelled 0 internally).
impostor
np.ndarray
required
1-D array of impostor scores (labelled 1 internally).
Returns tuple[np.ndarray, np.ndarray, np.ndarray](fpr, tpr, thresholds) as returned by sklearn.metrics.roc_curve.

compute_key_entropy_balance

Measures the cryptographic quality of a matrix of binary keys by computing per-bit Shannon entropy and a balance score.
def compute_key_entropy_balance(bit_matrix: np.ndarray) -> tuple[float, float]
bit_matrix
np.ndarray
required
Shape (N, key_bits) binary key matrix (uint8 values 0 or 1). A 1-D array is treated as a single key.
Returns tuple[float, float]
ComponentRangeIdeal
entropy[0, 1] bits/bit1.0 — every bit position is equally likely to be 0 or 1
balance[0, 1]1.0 — perfect 50/50 distribution across all bit positions

raw_key_to_bitarray

Converts a key in any of the common Neural Vault formats to a flat uint8 bit array of 0/1 values.
def raw_key_to_bitarray(raw_key: bytes | str | np.ndarray) -> np.ndarray
raw_key
bytes | str | np.ndarray
required
Key in one of three formats: raw bytes/bytearray, a lowercase hex str, or an np.ndarray that is either already a packed byte array or a pre-unpacked bit array.
Returns np.ndarray — flat uint8 array of 0/1 bit values. For a 256-bit key this has length 256.

add_gaussian_noise

Injects Additive White Gaussian Noise (AWGN) at a specified signal-to-noise ratio to simulate scanner noise degradation.
def add_gaussian_noise(x: np.ndarray, snr_db: float) -> np.ndarray
x
np.ndarray
required
Input signal array of any shape. NaN values are replaced with zero before noise is added.
snr_db
float
required
Target SNR in decibels. Higher values mean less noise. The benchmark suite tests [30, 20, 15, 10, 5, 0] dB.
Returns np.ndarray — noisy signal, same shape and dtype as input.

add_motion_artifacts

Simulates electrode displacement and motion artifacts by randomly corrupting a fraction of signal values with draws from N(0, 3.0).
def add_motion_artifacts(x: np.ndarray, prob: float) -> np.ndarray
x
np.ndarray
required
Input signal array of any shape.
prob
float
required
Fraction of values to corrupt, in the range [0.0, 1.0]. For example 0.10 corrupts 10% of values. The benchmark suite tests [0.0, 0.05, 0.10, 0.15, 0.20, 0.30].
Returns np.ndarray — corrupted signal, same shape as input.

Pipeline

Executes the complete Neural Vault benchmark pipeline end-to-end: data loading, Transformer training (100 epochs, AdamW, triplet loss), few-shot evaluation over 40 episodes, multi-method key generation benchmarking (SHA256, HMAC, BioHashing, Neural), Neural Vault prototype verification, AWGN noise robustness sweep, and motion-artifact robustness sweep. Saves all numerical results to benchmark/results/keygen_benchmark_results.json.
def run_integrated_pipeline() -> dict
Returns dict — results dictionary with the following top-level keys:
KeyTypeDescription
timestampstrISO-8601 datetime of the run
baseline_metricsdictPer-method d-prime, EER, and (for NeuralVault) ROC-AUC and threshold
neural_metricsdictFew-shot accuracy, F1, ROC-AUC aggregated over 40 episodes
noise_testsdictPer-method EER at each SNR level
artifact_testsdictPer-method EER at each artifact probability
vault_scoresdictRaw genuine/impostor cosine distances and EER threshold
vault_prototype_keyslist[str]Hex-encoded 256-bit prototype keys for each class
Renders a 3×2 matplotlib dashboard from the pipeline results and saves it to benchmark/visualizations/neuralvault_extended_dashboard.png at 200 DPI. Panels include: d-prime bar chart, EER bar chart, SNR robustness curves, few-shot metric bars, training loss curve, and a PCA-projected embedding scatter plot.
def generate_visualizations(results: dict) -> None
results
dict
required
The dictionary returned by run_integrated_pipeline(). The _plot_data key must be present and contain loss_history, embedding_2d, and embedding_labels.
Writes two files to benchmark/reports/: a machine-readable benchmark_summary.json identifying the highest-separability and lowest-error-rate methods, and a human-readable BENCHMARK_REPORT.md with tables for classification metrics, d-prime rankings, EER rankings, and per-method AWGN robustness curves.
def generate_reports(results: dict) -> None
results
dict
required
The dictionary returned by run_integrated_pipeline().
Creates the three output directories required by the pipeline — benchmark/results, benchmark/visualizations, and benchmark/reports — using Path.mkdir(parents=True, exist_ok=True). Safe to call repeatedly.
def setup_environment() -> None

Build docs developers (and LLMs) love