Neural Vault Python API Reference

Neural Vault exposes a flat Python API spread across two modules: main.py (pipeline orchestration, benchmark utilities, and training helpers) and model.py (the enrolled-user verification interface). Every public function is documented below in the order you would encounter it when running a full benchmark or integrating Neural Vault into an application.

Classes

`PositionalEncoding`

Adds learnable-free sinusoidal positional encodings to a sequence tensor before it is fed into the Transformer encoder stack. Frequencies are computed with the standard log-space division formula so that each dimension encodes a different temporal wavelength.

class PositionalEncoding(d_model: int, max_len: int = 512)

d_model

int

required

Width of the Transformer model dimension. Must match the d_model argument passed to NeuralVaultFewShot.

max_len

int

default:"512"

Maximum sequence length the buffer is pre-computed for. Sequences shorter than max_len are sliced; longer sequences will raise an index error at runtime.

`forward(x)`

torch.Tensor

required

Input tensor of shape (B, T, D) — batch × time-steps × model dimension. The encoding is added in-place via broadcasting; no extra allocation occurs.

Returns torch.Tensor — same shape (B, T, D) with positional signal fused into each time-step.

`NeuralVaultFewShot`

The core Prototypical Transformer Network. The architecture projects raw fMRI feature vectors into d_model space, applies sinusoidal positional encoding, passes the sequence through n_layers Pre-LN Transformer encoder layers (with dim_feedforward = d_model * 4 and 10% dropout), temporally pools the output with a mean-reduce, and projects the result through a 2-layer GELU head down to latent_dim. The final embedding is L2-normalised onto the unit hypersphere so that dot products equal cosine similarities.

class NeuralVaultFewShot(
    input_dim: int,
    d_model: int = 128,
    nhead: int = 8,
    n_layers: int = 2,
    latent_dim: int = 128,
)

input_dim

int

required

Number of input features per time-step — equals the number of TRIBEv2 cortical vertex predictions in the dataset (typically 20484 for a full cortical surface or the number of columns in 5classpreds.csv).

d_model

int

default:"128"

Internal Transformer model width. All linear projections, attention layers, and feed-forward sublayers operate in this dimensionality. The feed-forward sublayer width is automatically set to d_model * 4.

nhead

int

default:"8"

Number of parallel attention heads. d_model must be evenly divisible by nhead.

n_layers

int

default:"2"

Depth of the Transformer encoder stack. Each layer applies multi-head self-attention followed by a feed-forward sublayer, both with Pre-LN normalisation.

latent_dim

int

default:"128"

Dimensionality of the output embedding. All downstream key derivation, prototype comparison, and few-shot classification operate in this space.

`forward(x)`

torch.Tensor

required

Input tensor. Accepts either (B, T, F) (explicit sequence of T frames, each with F features) or (B, F) (single-frame, which is automatically unsqueezed to (B, 1, F) before processing).

Returns torch.Tensor — shape (B, latent_dim). Every row lies on the unit hypersphere (L2-norm = 1).

import torch
from main import NeuralVaultFewShot

model = NeuralVaultFewShot(input_dim=512, d_model=128, nhead=8, n_layers=2, latent_dim=128)
x = torch.randn(16, 5, 512)   # batch=16, seq_len=5, features=512
embeddings = model(x)         # → (16, 128), unit-norm rows

`triplet_loss(anchor, positive, negative, margin=0.3)`

Static method. Computes the standard triplet loss used during metric-learning training. Squared Euclidean distances are used so that gradients are smooth near zero.

L = mean(relu(||anchor - positive||² - ||anchor - negative||² + margin))

anchor

torch.Tensor

required

Shape (B, latent_dim) — reference embeddings for each triplet.

positive

torch.Tensor

required

Shape (B, latent_dim) — same-class embeddings that should be pulled closer to the anchor.

negative

torch.Tensor

required

Shape (B, latent_dim) — different-class embeddings that should be pushed away from the anchor.

margin

float

default:"0.3"

Minimum enforced distance gap between positive and negative pairs. Triplets already satisfying this margin contribute zero loss.

Returns torch.Tensor — scalar loss tensor. Back-propagate directly with .backward().

loss = NeuralVaultFewShot.triplet_loss(emb_anc, emb_pos, emb_neg, margin=0.3)
loss.backward()

Key Derivation

`derive_key`

Derives a stable cryptographic key from a unit-norm embedding vector. The embedding is first quantised to int16 (multiplied by 1000 and cast), then its raw bytes are used as input keying material (IKM) for HKDF-SHA256 with the fixed info string b"neural-vault-few-shot-v1". If the cryptography package is unavailable, the function falls back to raw hashlib.sha256.

def derive_key(emb_vec: np.ndarray, key_len_bits: int = 256) -> tuple[bytes, str]

emb_vec

np.ndarray

required

L2-normalised embedding vector, shape (latent_dim,). Should come from NeuralVaultFewShot.forward or get_embedding.

key_len_bits

int

default:"256"

Desired key length in bits. Must be a multiple of 8. Values other than 256 fall back to the hashlib path when cryptography is unavailable.

Returns tuple[bytes, str] — (key_bytes, key_hex). key_bytes is the raw binary key; key_hex is its lowercase hex representation.

from main import derive_key
import numpy as np

emb = np.random.randn(128).astype(np.float32)
emb /= np.linalg.norm(emb)

key_bytes, key_hex = derive_key(emb, key_len_bits=256)
print(key_hex)   # e.g. "3a9f2c..."  (64 hex characters)

Quantisation to int16 (scale factor 1000) means embedding differences smaller than 0.001 are invisible to the key derivation step. Ensure your embedding is genuinely stable before calling derive_key in a production enrollment flow.

Verification

These functions live in model.py and operate against a pre-enrolled prototype (prototype_vec) that is computed at module load time from 5classpreds.csv. They encapsulate the full normalisation, sequence-building, and embedding pipeline so callers only need to pass raw fMRI feature arrays.

`get_embedding`

Normalises raw fMRI data using the per-feature statistics fitted at enrollment time, chunks the result into non-overlapping sequences of SEQ_CHUNK frames, runs the model, and returns the mean embedding across all chunks.

def get_embedding(raw_data_np: np.ndarray) -> np.ndarray

raw_data_np

np.ndarray

required

Shape (N, N_FEAT) — raw (unnormalised) fMRI feature matrix from a new acquisition session. N must be at least 1; if fewer than SEQ_CHUNK rows are present, the array is tiled to pad the first sequence.

Returns np.ndarray — shape (latent_dim,). Mean embedding, not re-normalised after averaging; callers should check this if comparing raw dot products.

`verify`

Computes the cosine similarity between a new fMRI acquisition and the enrolled user prototype. Because both vectors are L2-normalised before being stored, the similarity is equivalent to a plain dot product.

def verify(raw_data_np: np.ndarray) -> float

raw_data_np

np.ndarray

required

Shape (N, N_FEAT) raw fMRI feature matrix for the authentication attempt.

Returns float — cosine similarity in the range [-1, 1]. Compare against the EER threshold (≈ 0.316 from benchmark results) to accept or reject.

from model import verify

# Load your acquisition …
score = verify(new_session_data)
if score > 0.316:
    print("Authenticated")
else:
    print("Rejected")

Data Processing

`load_and_prepare_data`

Reads 5classpreds.csv from the current working directory, maps the five stimulus-class video labels to integers (0–4), pivots the long-format prediction table into a (timestep × feature) matrix, replaces any NaN / ±Inf with safe defaults, and returns a StandardScaler-normalised feature matrix alongside the integer label vector.

def load_and_prepare_data() -> tuple[np.ndarray, np.ndarray]

Parameters — none. Returns tuple[np.ndarray, np.ndarray]

Component	Shape	Description
`X_scaled`	`(N, N_FEAT)`	z-score standardised feature matrix
`y`	`(N,)`	integer class labels, dtype `int64`

Raises FileNotFoundError if 5classpreds.csv is not found in the current directory.

from main import load_and_prepare_data

X_scaled, y = load_and_prepare_data()
print(X_scaled.shape, y.shape)   # e.g. (120, 512) (120,)

`build_sequences`

Splits a normalised feature array into non-overlapping temporal windows of seq_len frames. Trailing rows that do not fill a complete window are discarded. This is the sequence builder used by model.py during both training and verification.

def build_sequences(data: np.ndarray, seq_len: int = SEQ_CHUNK) -> torch.Tensor

data

np.ndarray

required

Shape (N, N_FEAT) normalised feature array. Rows are ordered chronologically.

seq_len

int

default:"5"

Temporal window size in frames. Matches the SEQ_CHUNK constant in model.py (default 5).

Returns torch.Tensor — shape (B, seq_len, N_FEAT) where B = floor(N / seq_len).

`build_sequence_tensor`

Wraps a feature matrix as single-frame sequences for use with the main.py training loop. Each row becomes a (1, N_FEAT) sequence tensor rather than a chunked temporal window.

def build_sequence_tensor(X: np.ndarray) -> torch.Tensor

np.ndarray

required

Shape (N, N_FEAT) feature matrix, dtype float32.

Returns torch.Tensor — shape (N, 1, N_FEAT), dtype float32.

`class_prototypes`

Computes the mean embedding for each class, producing a prototype matrix used for few-shot nearest-centroid classification and vault-key derivation.

def class_prototypes(embeddings: np.ndarray, labels: np.ndarray) -> np.ndarray

embeddings

np.ndarray

required

Shape (N, latent_dim) — L2-normalised embeddings for all samples.

labels

np.ndarray

required

Shape (N,) — integer class labels aligned row-by-row with embeddings. Classes must be contiguous integers starting at 0 up to N_CLASSES - 1.

Returns np.ndarray — shape (N_CLASSES, latent_dim). Row c is the mean embedding of all samples with labels == c.

Evaluation

`evaluate_fewshot`

Runs nearest-centroid few-shot inference on a support/query split. Prototypes are computed from (X_train, y_train) and each query in X_test is assigned the class whose prototype is closest under squared Euclidean distance.

def evaluate_fewshot(
    model: NeuralVaultFewShot,
    X_train: np.ndarray,
    y_train: np.ndarray,
    X_test: np.ndarray,
    y_test: np.ndarray,
) -> tuple[np.ndarray, np.ndarray]

model

NeuralVaultFewShot

required

A trained NeuralVaultFewShot instance. The function calls model.eval() internally and wraps inference in torch.no_grad().

X_train

np.ndarray

required

Support set features, shape (n_shot * N_CLASSES, N_FEAT).

y_train

np.ndarray

required

Support set labels, shape (n_shot * N_CLASSES,).

X_test

np.ndarray

required

Query set features, shape (n_query * N_CLASSES, N_FEAT).

y_test

np.ndarray

required

Query set ground-truth labels, shape (n_query * N_CLASSES,). Used externally for metric computation — not consumed inside this function.

Returns tuple[np.ndarray, np.ndarray]

Component	Shape	Description
`preds`	`(n_query * N_CLASSES,)`	Predicted class indices
`probs`	`(n_query * N_CLASSES, N_CLASSES)`	Softmax of negative distances, usable as class probabilities for ROC-AUC

`evaluate_keygen_method`

Benchmarks a key-generation function against all five stimulus classes by comparing keys derived from class prototypes against per-sample keys using Hamming distance. Produces arrays of genuine and impostor Hamming distances suitable for EER and d-prime computation.

def evaluate_keygen_method(
    X: np.ndarray,
    y: np.ndarray,
    keygen_fn: callable,
) -> tuple[np.ndarray, np.ndarray]

np.ndarray

required

Standardised feature matrix, shape (N, N_FEAT).

np.ndarray

required

Integer class labels, shape (N,).

keygen_fn

callable

required

Function mapping a feature vector (np.ndarray of shape (N_FEAT,)) to a binary key (np.ndarray of uint8 0/1 values or raw bytes). Built-in options include SHA256, HMAC-SHA256, and BioHashing.

Returns tuple[np.ndarray, np.ndarray] — (genuine_dists, impostor_dists). Values are Hamming distances (fractions in [0, 1]). Ideal genuine distances are near 0; ideal impostor distances are near 0.5.

`verify_similarity`

Computes cosine distances between per-class prototype embeddings and all sample embeddings, splitting results into genuine-match and impostor-match arrays.

def verify_similarity(
    embeddings: np.ndarray,
    prototypes: np.ndarray,
    labels: np.ndarray,
) -> tuple[np.ndarray, np.ndarray]

embeddings

np.ndarray

required

Shape (N, latent_dim).

prototypes

np.ndarray

required

Shape (N_CLASSES, latent_dim) — output of class_prototypes.

labels

np.ndarray

required

Shape (N,) — integer class labels aligned with embeddings.

Returns tuple[np.ndarray, np.ndarray] — (genuine, impostor). Values are cosine distances (1 - cosine_similarity), so smaller = more similar.

`euclidean_dist`

Computes the full pairwise squared Euclidean distance matrix between two sets of embedding vectors using broadcasting. Used internally by evaluate_fewshot for nearest-centroid assignment.

def euclidean_dist(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor

torch.Tensor

required

Shape (N, D) — query embeddings.

torch.Tensor

required

Shape (M, D) — prototype embeddings.

Returns torch.Tensor — shape (N, M) pairwise squared Euclidean distances. Raises ValueError if x.size(1) != y.size(1).

Benchmark Utilities

`compute_d_prime`

Computes d-prime (d′), the signal-detection-theory measure of separability between genuine and impostor score distributions.

def compute_d_prime(genuine: np.ndarray, impostor: np.ndarray) -> float

genuine

np.ndarray

required

1-D array of genuine match scores or distances.

impostor

np.ndarray

required

1-D array of impostor scores or distances.

Returns float — d-prime value. A small epsilon (1e-9) is added to the pooled variance denominator to guard against division by zero.

`compute_eer`

Computes the Equal Error Rate by finding the threshold where the false positive rate equals the false negative rate. Uses Brent’s root-finding method (scipy.optimize.brentq) on a linear interpolation of the ROC curve for numerical precision.

def compute_eer(genuine: np.ndarray, impostor: np.ndarray) -> float

genuine

np.ndarray

required

1-D array of genuine match scores.

impostor

np.ndarray

required

1-D array of impostor scores.

Returns float — EER as a percentage in the range [0, 100]. Returns 50.0 if either input array is empty.

`compute_roc_from_scores`

Constructs a combined label and score vector from genuine/impostor arrays and delegates to sklearn.metrics.roc_curve.

def compute_roc_from_scores(
    genuine: np.ndarray,
    impostor: np.ndarray,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]

genuine

np.ndarray

required

1-D array of genuine match scores (labelled 0 internally).

impostor

np.ndarray

required

1-D array of impostor scores (labelled 1 internally).

Returns tuple[np.ndarray, np.ndarray, np.ndarray] — (fpr, tpr, thresholds) as returned by sklearn.metrics.roc_curve.

`compute_key_entropy_balance`

Measures the cryptographic quality of a matrix of binary keys by computing per-bit Shannon entropy and a balance score.

def compute_key_entropy_balance(bit_matrix: np.ndarray) -> tuple[float, float]

bit_matrix

np.ndarray

required

Shape (N, key_bits) binary key matrix (uint8 values 0 or 1). A 1-D array is treated as a single key.

Returns tuple[float, float]

Component	Range	Ideal
`entropy`	`[0, 1]` bits/bit	`1.0` — every bit position is equally likely to be 0 or 1
`balance`	`[0, 1]`	`1.0` — perfect 50/50 distribution across all bit positions

`raw_key_to_bitarray`

Converts a key in any of the common Neural Vault formats to a flat uint8 bit array of 0/1 values.

def raw_key_to_bitarray(raw_key: bytes | str | np.ndarray) -> np.ndarray

raw_key

bytes | str | np.ndarray

required

Key in one of three formats: raw bytes/bytearray, a lowercase hex str, or an np.ndarray that is either already a packed byte array or a pre-unpacked bit array.

Returns np.ndarray — flat uint8 array of 0/1 bit values. For a 256-bit key this has length 256.

`add_gaussian_noise`

Injects Additive White Gaussian Noise (AWGN) at a specified signal-to-noise ratio to simulate scanner noise degradation.

def add_gaussian_noise(x: np.ndarray, snr_db: float) -> np.ndarray

np.ndarray

required

Input signal array of any shape. NaN values are replaced with zero before noise is added.

snr_db

float

required

Target SNR in decibels. Higher values mean less noise. The benchmark suite tests [30, 20, 15, 10, 5, 0] dB.

Returns np.ndarray — noisy signal, same shape and dtype as input.

`add_motion_artifacts`

Simulates electrode displacement and motion artifacts by randomly corrupting a fraction of signal values with draws from N(0, 3.0).

def add_motion_artifacts(x: np.ndarray, prob: float) -> np.ndarray

np.ndarray

required

Input signal array of any shape.

prob

float

required

Fraction of values to corrupt, in the range [0.0, 1.0]. For example 0.10 corrupts 10% of values. The benchmark suite tests [0.0, 0.05, 0.10, 0.15, 0.20, 0.30].

Returns np.ndarray — corrupted signal, same shape as input.

Pipeline

run_integrated_pipeline()

Executes the complete Neural Vault benchmark pipeline end-to-end: data loading, Transformer training (100 epochs, AdamW, triplet loss), few-shot evaluation over 40 episodes, multi-method key generation benchmarking (SHA256, HMAC, BioHashing, Neural), Neural Vault prototype verification, AWGN noise robustness sweep, and motion-artifact robustness sweep. Saves all numerical results to benchmark/results/keygen_benchmark_results.json.

def run_integrated_pipeline() -> dict

Returns dict — results dictionary with the following top-level keys:

Key	Type	Description
`timestamp`	`str`	ISO-8601 datetime of the run
`baseline_metrics`	`dict`	Per-method d-prime, EER, and (for NeuralVault) ROC-AUC and threshold
`neural_metrics`	`dict`	Few-shot accuracy, F1, ROC-AUC aggregated over 40 episodes
`noise_tests`	`dict`	Per-method EER at each SNR level
`artifact_tests`	`dict`	Per-method EER at each artifact probability
`vault_scores`	`dict`	Raw genuine/impostor cosine distances and EER threshold
`vault_prototype_keys`	`list[str]`	Hex-encoded 256-bit prototype keys for each class

generate_visualizations(results)

Renders a 3×2 matplotlib dashboard from the pipeline results and saves it to benchmark/visualizations/neuralvault_extended_dashboard.png at 200 DPI. Panels include: d-prime bar chart, EER bar chart, SNR robustness curves, few-shot metric bars, training loss curve, and a PCA-projected embedding scatter plot.

def generate_visualizations(results: dict) -> None

results

dict

required

The dictionary returned by run_integrated_pipeline(). The _plot_data key must be present and contain loss_history, embedding_2d, and embedding_labels.

generate_reports(results)

Writes two files to benchmark/reports/: a machine-readable benchmark_summary.json identifying the highest-separability and lowest-error-rate methods, and a human-readable BENCHMARK_REPORT.md with tables for classification metrics, d-prime rankings, EER rankings, and per-method AWGN robustness curves.

def generate_reports(results: dict) -> None

results

dict

required

The dictionary returned by run_integrated_pipeline().

setup_environment()

Creates the three output directories required by the pipeline — benchmark/results, benchmark/visualizations, and benchmark/reports — using Path.mkdir(parents=True, exist_ok=True). Safe to call repeatedly.

def setup_environment() -> None

Overview

Getting Started

Pipeline

Benchmarking

Reference

Neural Vault Python API Reference

Classes

`PositionalEncoding`

`forward(x)`

`NeuralVaultFewShot`

`forward(x)`

`triplet_loss(anchor, positive, negative, margin=0.3)`

Key Derivation

`derive_key`

Verification

`get_embedding`

`verify`

Data Processing

`load_and_prepare_data`

`build_sequences`

`build_sequence_tensor`

`class_prototypes`

Evaluation

`evaluate_fewshot`

`evaluate_keygen_method`

`verify_similarity`

`euclidean_dist`

Benchmark Utilities

`compute_d_prime`

`compute_eer`

`compute_roc_from_scores`

`compute_key_entropy_balance`

`raw_key_to_bitarray`

`add_gaussian_noise`

`add_motion_artifacts`

Pipeline

Build docs developers (and LLMs) love

Overview

Getting Started

Pipeline

Benchmarking

Reference

Documentation Index

​Classes

​PositionalEncoding

​forward(x)

​NeuralVaultFewShot

​forward(x)

​triplet_loss(anchor, positive, negative, margin=0.3)

​Key Derivation

​derive_key

​Verification

​get_embedding

​verify

​Data Processing

​load_and_prepare_data

​build_sequences

​build_sequence_tensor

​class_prototypes

​Evaluation

​evaluate_fewshot

​evaluate_keygen_method

​verify_similarity

​euclidean_dist

​Benchmark Utilities

​compute_d_prime

​compute_eer

​compute_roc_from_scores

​compute_key_entropy_balance

​raw_key_to_bitarray

​add_gaussian_noise

​add_motion_artifacts

​Pipeline

Build docs developers (and LLMs) love

Classes

`PositionalEncoding`

`forward(x)`

`NeuralVaultFewShot`

`forward(x)`

`triplet_loss(anchor, positive, negative, margin=0.3)`

Key Derivation

`derive_key`

Verification

`get_embedding`

`verify`

Data Processing

`load_and_prepare_data`

`build_sequences`

`build_sequence_tensor`

`class_prototypes`

Evaluation

`evaluate_fewshot`

`evaluate_keygen_method`

`verify_similarity`

`euclidean_dist`

Benchmark Utilities

`compute_d_prime`

`compute_eer`

`compute_roc_from_scores`

`compute_key_entropy_balance`

`raw_key_to_bitarray`

`add_gaussian_noise`

`add_motion_artifacts`

Pipeline