Neural Vault exposes a flat Python API spread across two modules:Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Skieriya/fMRI-key-generation-with-TRIBEv2/llms.txt
Use this file to discover all available pages before exploring further.
main.py (pipeline orchestration, benchmark utilities, and training helpers) and model.py (the enrolled-user verification interface). Every public function is documented below in the order you would encounter it when running a full benchmark or integrating Neural Vault into an application.
Classes
PositionalEncoding
Adds learnable-free sinusoidal positional encodings to a sequence tensor before it is fed into the Transformer encoder stack. Frequencies are computed with the standard log-space division formula so that each dimension encodes a different temporal wavelength.
Width of the Transformer model dimension. Must match the
d_model argument passed to NeuralVaultFewShot.Maximum sequence length the buffer is pre-computed for. Sequences shorter than
max_len are sliced; longer sequences will raise an index error at runtime.forward(x)
Input tensor of shape
(B, T, D) — batch × time-steps × model dimension. The encoding is added in-place via broadcasting; no extra allocation occurs.torch.Tensor — same shape (B, T, D) with positional signal fused into each time-step.
NeuralVaultFewShot
The core Prototypical Transformer Network. The architecture projects raw fMRI feature vectors into d_model space, applies sinusoidal positional encoding, passes the sequence through n_layers Pre-LN Transformer encoder layers (with dim_feedforward = d_model * 4 and 10% dropout), temporally pools the output with a mean-reduce, and projects the result through a 2-layer GELU head down to latent_dim. The final embedding is L2-normalised onto the unit hypersphere so that dot products equal cosine similarities.
Number of input features per time-step — equals the number of TRIBEv2 cortical vertex predictions in the dataset (typically
20484 for a full cortical surface or the number of columns in 5classpreds.csv).Internal Transformer model width. All linear projections, attention layers, and feed-forward sublayers operate in this dimensionality. The feed-forward sublayer width is automatically set to
d_model * 4.Number of parallel attention heads.
d_model must be evenly divisible by nhead.Depth of the Transformer encoder stack. Each layer applies multi-head self-attention followed by a feed-forward sublayer, both with Pre-LN normalisation.
Dimensionality of the output embedding. All downstream key derivation, prototype comparison, and few-shot classification operate in this space.
forward(x)
Input tensor. Accepts either
(B, T, F) (explicit sequence of T frames, each with F features) or (B, F) (single-frame, which is automatically unsqueezed to (B, 1, F) before processing).torch.Tensor — shape (B, latent_dim). Every row lies on the unit hypersphere (L2-norm = 1).
triplet_loss(anchor, positive, negative, margin=0.3)
Static method. Computes the standard triplet loss used during metric-learning training. Squared Euclidean distances are used so that gradients are smooth near zero.
Shape
(B, latent_dim) — reference embeddings for each triplet.Shape
(B, latent_dim) — same-class embeddings that should be pulled closer to the anchor.Shape
(B, latent_dim) — different-class embeddings that should be pushed away from the anchor.Minimum enforced distance gap between positive and negative pairs. Triplets already satisfying this margin contribute zero loss.
torch.Tensor — scalar loss tensor. Back-propagate directly with .backward().
Key Derivation
derive_key
Derives a stable cryptographic key from a unit-norm embedding vector. The embedding is first quantised to int16 (multiplied by 1000 and cast), then its raw bytes are used as input keying material (IKM) for HKDF-SHA256 with the fixed info string b"neural-vault-few-shot-v1". If the cryptography package is unavailable, the function falls back to raw hashlib.sha256.
L2-normalised embedding vector, shape
(latent_dim,). Should come from NeuralVaultFewShot.forward or get_embedding.Desired key length in bits. Must be a multiple of 8. Values other than 256 fall back to the
hashlib path when cryptography is unavailable.tuple[bytes, str] — (key_bytes, key_hex). key_bytes is the raw binary key; key_hex is its lowercase hex representation.
Quantisation to
int16 (scale factor 1000) means embedding differences smaller than 0.001 are invisible to the key derivation step. Ensure your embedding is genuinely stable before calling derive_key in a production enrollment flow.Verification
These functions live inmodel.py and operate against a pre-enrolled prototype (prototype_vec) that is computed at module load time from 5classpreds.csv. They encapsulate the full normalisation, sequence-building, and embedding pipeline so callers only need to pass raw fMRI feature arrays.
get_embedding
Normalises raw fMRI data using the per-feature statistics fitted at enrollment time, chunks the result into non-overlapping sequences of SEQ_CHUNK frames, runs the model, and returns the mean embedding across all chunks.
Shape
(N, N_FEAT) — raw (unnormalised) fMRI feature matrix from a new acquisition session. N must be at least 1; if fewer than SEQ_CHUNK rows are present, the array is tiled to pad the first sequence.np.ndarray — shape (latent_dim,). Mean embedding, not re-normalised after averaging; callers should check this if comparing raw dot products.
verify
Computes the cosine similarity between a new fMRI acquisition and the enrolled user prototype. Because both vectors are L2-normalised before being stored, the similarity is equivalent to a plain dot product.
Shape
(N, N_FEAT) raw fMRI feature matrix for the authentication attempt.float — cosine similarity in the range [-1, 1]. Compare against the EER threshold (≈ 0.316 from benchmark results) to accept or reject.
Data Processing
load_and_prepare_data
Reads 5classpreds.csv from the current working directory, maps the five stimulus-class video labels to integers (0–4), pivots the long-format prediction table into a (timestep × feature) matrix, replaces any NaN / ±Inf with safe defaults, and returns a StandardScaler-normalised feature matrix alongside the integer label vector.
tuple[np.ndarray, np.ndarray]
| Component | Shape | Description |
|---|---|---|
X_scaled | (N, N_FEAT) | z-score standardised feature matrix |
y | (N,) | integer class labels, dtype int64 |
FileNotFoundError if 5classpreds.csv is not found in the current directory.
build_sequences
Splits a normalised feature array into non-overlapping temporal windows of seq_len frames. Trailing rows that do not fill a complete window are discarded. This is the sequence builder used by model.py during both training and verification.
Shape
(N, N_FEAT) normalised feature array. Rows are ordered chronologically.Temporal window size in frames. Matches the
SEQ_CHUNK constant in model.py (default 5).torch.Tensor — shape (B, seq_len, N_FEAT) where B = floor(N / seq_len).
build_sequence_tensor
Wraps a feature matrix as single-frame sequences for use with the main.py training loop. Each row becomes a (1, N_FEAT) sequence tensor rather than a chunked temporal window.
Shape
(N, N_FEAT) feature matrix, dtype float32.torch.Tensor — shape (N, 1, N_FEAT), dtype float32.
class_prototypes
Computes the mean embedding for each class, producing a prototype matrix used for few-shot nearest-centroid classification and vault-key derivation.
Shape
(N, latent_dim) — L2-normalised embeddings for all samples.Shape
(N,) — integer class labels aligned row-by-row with embeddings. Classes must be contiguous integers starting at 0 up to N_CLASSES - 1.np.ndarray — shape (N_CLASSES, latent_dim). Row c is the mean embedding of all samples with labels == c.
Evaluation
evaluate_fewshot
Runs nearest-centroid few-shot inference on a support/query split. Prototypes are computed from (X_train, y_train) and each query in X_test is assigned the class whose prototype is closest under squared Euclidean distance.
A trained
NeuralVaultFewShot instance. The function calls model.eval() internally and wraps inference in torch.no_grad().Support set features, shape
(n_shot * N_CLASSES, N_FEAT).Support set labels, shape
(n_shot * N_CLASSES,).Query set features, shape
(n_query * N_CLASSES, N_FEAT).Query set ground-truth labels, shape
(n_query * N_CLASSES,). Used externally for metric computation — not consumed inside this function.tuple[np.ndarray, np.ndarray]
| Component | Shape | Description |
|---|---|---|
preds | (n_query * N_CLASSES,) | Predicted class indices |
probs | (n_query * N_CLASSES, N_CLASSES) | Softmax of negative distances, usable as class probabilities for ROC-AUC |
evaluate_keygen_method
Benchmarks a key-generation function against all five stimulus classes by comparing keys derived from class prototypes against per-sample keys using Hamming distance. Produces arrays of genuine and impostor Hamming distances suitable for EER and d-prime computation.
Standardised feature matrix, shape
(N, N_FEAT).Integer class labels, shape
(N,).Function mapping a feature vector (
np.ndarray of shape (N_FEAT,)) to a binary key (np.ndarray of uint8 0/1 values or raw bytes). Built-in options include SHA256, HMAC-SHA256, and BioHashing.tuple[np.ndarray, np.ndarray] — (genuine_dists, impostor_dists). Values are Hamming distances (fractions in [0, 1]). Ideal genuine distances are near 0; ideal impostor distances are near 0.5.
verify_similarity
Computes cosine distances between per-class prototype embeddings and all sample embeddings, splitting results into genuine-match and impostor-match arrays.
Shape
(N, latent_dim).Shape
(N_CLASSES, latent_dim) — output of class_prototypes.Shape
(N,) — integer class labels aligned with embeddings.tuple[np.ndarray, np.ndarray] — (genuine, impostor). Values are cosine distances (1 - cosine_similarity), so smaller = more similar.
euclidean_dist
Computes the full pairwise squared Euclidean distance matrix between two sets of embedding vectors using broadcasting. Used internally by evaluate_fewshot for nearest-centroid assignment.
Shape
(N, D) — query embeddings.Shape
(M, D) — prototype embeddings.torch.Tensor — shape (N, M) pairwise squared Euclidean distances.
Raises ValueError if x.size(1) != y.size(1).
Benchmark Utilities
compute_d_prime
Computes d-prime (d′), the signal-detection-theory measure of separability between genuine and impostor score distributions.
1-D array of genuine match scores or distances.
1-D array of impostor scores or distances.
float — d-prime value. A small epsilon (1e-9) is added to the pooled variance denominator to guard against division by zero.
compute_eer
Computes the Equal Error Rate by finding the threshold where the false positive rate equals the false negative rate. Uses Brent’s root-finding method (scipy.optimize.brentq) on a linear interpolation of the ROC curve for numerical precision.
1-D array of genuine match scores.
1-D array of impostor scores.
float — EER as a percentage in the range [0, 100]. Returns 50.0 if either input array is empty.
compute_roc_from_scores
Constructs a combined label and score vector from genuine/impostor arrays and delegates to sklearn.metrics.roc_curve.
1-D array of genuine match scores (labelled 0 internally).
1-D array of impostor scores (labelled 1 internally).
tuple[np.ndarray, np.ndarray, np.ndarray] — (fpr, tpr, thresholds) as returned by sklearn.metrics.roc_curve.
compute_key_entropy_balance
Measures the cryptographic quality of a matrix of binary keys by computing per-bit Shannon entropy and a balance score.
Shape
(N, key_bits) binary key matrix (uint8 values 0 or 1). A 1-D array is treated as a single key.tuple[float, float]
| Component | Range | Ideal |
|---|---|---|
entropy | [0, 1] bits/bit | 1.0 — every bit position is equally likely to be 0 or 1 |
balance | [0, 1] | 1.0 — perfect 50/50 distribution across all bit positions |
raw_key_to_bitarray
Converts a key in any of the common Neural Vault formats to a flat uint8 bit array of 0/1 values.
Key in one of three formats: raw
bytes/bytearray, a lowercase hex str, or an np.ndarray that is either already a packed byte array or a pre-unpacked bit array.np.ndarray — flat uint8 array of 0/1 bit values. For a 256-bit key this has length 256.
add_gaussian_noise
Injects Additive White Gaussian Noise (AWGN) at a specified signal-to-noise ratio to simulate scanner noise degradation.
Input signal array of any shape. NaN values are replaced with zero before noise is added.
Target SNR in decibels. Higher values mean less noise. The benchmark suite tests
[30, 20, 15, 10, 5, 0] dB.np.ndarray — noisy signal, same shape and dtype as input.
add_motion_artifacts
Simulates electrode displacement and motion artifacts by randomly corrupting a fraction of signal values with draws from N(0, 3.0).
Input signal array of any shape.
Fraction of values to corrupt, in the range
[0.0, 1.0]. For example 0.10 corrupts 10% of values. The benchmark suite tests [0.0, 0.05, 0.10, 0.15, 0.20, 0.30].np.ndarray — corrupted signal, same shape as input.
Pipeline
run_integrated_pipeline()
run_integrated_pipeline()
Executes the complete Neural Vault benchmark pipeline end-to-end: data loading, Transformer training (100 epochs, AdamW, triplet loss), few-shot evaluation over 40 episodes, multi-method key generation benchmarking (SHA256, HMAC, BioHashing, Neural), Neural Vault prototype verification, AWGN noise robustness sweep, and motion-artifact robustness sweep. Saves all numerical results to Returns
benchmark/results/keygen_benchmark_results.json.dict — results dictionary with the following top-level keys:| Key | Type | Description |
|---|---|---|
timestamp | str | ISO-8601 datetime of the run |
baseline_metrics | dict | Per-method d-prime, EER, and (for NeuralVault) ROC-AUC and threshold |
neural_metrics | dict | Few-shot accuracy, F1, ROC-AUC aggregated over 40 episodes |
noise_tests | dict | Per-method EER at each SNR level |
artifact_tests | dict | Per-method EER at each artifact probability |
vault_scores | dict | Raw genuine/impostor cosine distances and EER threshold |
vault_prototype_keys | list[str] | Hex-encoded 256-bit prototype keys for each class |
generate_visualizations(results)
generate_visualizations(results)
Renders a 3×2 matplotlib dashboard from the pipeline results and saves it to
benchmark/visualizations/neuralvault_extended_dashboard.png at 200 DPI. Panels include: d-prime bar chart, EER bar chart, SNR robustness curves, few-shot metric bars, training loss curve, and a PCA-projected embedding scatter plot.The dictionary returned by
run_integrated_pipeline(). The _plot_data key must be present and contain loss_history, embedding_2d, and embedding_labels.generate_reports(results)
generate_reports(results)
Writes two files to
benchmark/reports/: a machine-readable benchmark_summary.json identifying the highest-separability and lowest-error-rate methods, and a human-readable BENCHMARK_REPORT.md with tables for classification metrics, d-prime rankings, EER rankings, and per-method AWGN robustness curves.The dictionary returned by
run_integrated_pipeline().setup_environment()
setup_environment()
Creates the three output directories required by the pipeline —
benchmark/results, benchmark/visualizations, and benchmark/reports — using Path.mkdir(parents=True, exist_ok=True). Safe to call repeatedly.