256-bit Key Derivation from fMRI Embeddings via HKDF-SHA256

Once the Transformer encoder produces an L2-normalized embedding for a user, Neural Vault converts it into a deterministic 256-bit cryptographic key. The derivation pipeline has three stages: quantization of the continuous-valued embedding to a reproducible integer representation, serialization to raw bytes that serve as HKDF input keying material, and key stretching via HKDF-SHA256 with a domain-specific info string. The same embedding always produces the same key — identity is the entropy source, not randomness.

`derive_key` — main.py

The production version in main.py gracefully handles environments where the cryptography package is unavailable, falling back to a raw hashlib.sha256 digest:

def derive_key(emb_vec: np.ndarray, key_len_bits: int = 256):
    """Derives a stable key from a normalized embedding."""
    quantised = (emb_vec * 1000).astype(np.int16)
    ikm = quantised.tobytes()
    if has_cryptography and HKDF is not None and hashes is not None:
        kdf = HKDF(
            algorithm=hashes.SHA256(),
            length=key_len_bits // 8,
            salt=None,
            info=b"neural-vault-few-shot-v1"
        )
        key_bytes = kdf.derive(ikm)
    else:
        key_bytes = hashlib.sha256(ikm).digest()
        if key_len_bits != 256:
            key_bytes = hashlib.sha256(key_bytes + key_bytes).digest()[: key_len_bits // 8]
    return key_bytes, key_bytes.hex()

`derive_key` — model.py

The model.py variant assumes the cryptography package is always present (it is listed as a required dependency) and omits the fallback branch:

def derive_key(emb_vec: np.ndarray, key_len_bits: int = 256) -> bytes:
    """
    Derives a cryptographic key from a stable embedding point.
    """
    # Quantise to int16 bytes
    quantised  = (emb_vec * 1000).astype(np.int16)
    ikm        = quantised.tobytes()

    kdf  = HKDF(algorithm=hashes.SHA256(),
                 length=key_len_bits // 8,
                 salt=None,
                 info=b"neural-vault-few-shot-v1")
    key_bytes = kdf.derive(ikm)
    return key_bytes, key_bytes.hex()

Both implementations are semantically identical when the cryptography package is installed.

Step-by-Step Derivation

Step 1 — Quantize to int16

quantised = (emb_vec * 1000).astype(np.int16)

The embedding vector produced by NeuralVaultFewShot.forward is L2-normalized and lies on the unit hypersphere, so each component is a float in [-1.0, 1.0]. Multiplying by 1000 maps this range to [-1000, 1000], well within the int16 range of [-32768, 32767]. The scale factor of 1000 preserves three decimal places of precision — enough to distinguish embeddings while discarding floating-point noise below the millisecond threshold that would destabilize key reproduction across inference runs.

Step 2 — Serialize to bytes (IKM)

ikm = quantised.tobytes()

np.ndarray.tobytes() serializes the int16 array to a raw byte sequence in native memory order. For a latent_dim=128 embedding, this produces 128 × 2 = 256 bytes of Input Keying Material (IKM) passed into HKDF.

Step 3 — HKDF-SHA256

kdf = HKDF(
    algorithm=hashes.SHA256(),
    length=32,          # 256 bits ÷ 8
    salt=None,
    info=b"neural-vault-few-shot-v1"
)
key_bytes = kdf.derive(ikm)

HKDF (RFC 5869) performs two operations internally:

Extract — HMAC-SHA256(salt, IKM) compresses the input bytes into a pseudorandom key. With salt=None, the RFC specifies a zero-filled salt of HashLen bytes.
Expand — iteratively applies HMAC-SHA256 with the info context string to produce the requested output length.

The info parameter b"neural-vault-few-shot-v1" domain-separates this derivation from any other HKDF usage in the same application. If the system is updated with a new model version, bumping the info string to b"neural-vault-few-shot-v2" produces completely different keys from the same embeddings, preventing cross-version key collisions.

Step 4 — Fallback (no `cryptography`)

key_bytes = hashlib.sha256(ikm).digest()
if key_len_bits != 256:
    key_bytes = hashlib.sha256(key_bytes + key_bytes).digest()[: key_len_bits // 8]

When the cryptography package is absent, a single SHA-256 hash of the IKM is used. This loses HKDF’s domain separation and extract-then-expand structure but retains the core determinism property for environments where installing additional dependencies is not possible.

Class Prototype Computation

Keys are derived from class prototype vectors rather than from individual sample embeddings. A prototype is the centroid of all embeddings belonging to one class:

def class_prototypes(embeddings, labels):
    return np.vstack([
        embeddings[labels == c].mean(axis=0)
        for c in range(N_CLASSES)
    ])

Averaging over multiple samples smooths out per-session embedding drift caused by fMRI noise, producing a more stable IKM than any single sample would provide. The enrollment step derives one key per class:

prototype_keys = [derive_key(proto)[1] for proto in prototypes]

Benchmark Prototype Keys

Running the full pipeline on 5classpreds.csv produces the following 256-bit prototype keys (hex-encoded):

Class	Key (hex)
0	`1eaa2877f253315cecb77828e30f707d8fe381b07ba83851725c84a2e240c69b`
1	`5f4da41c3524a593bd6a9cec804a75e2167b33aa324c1d73daa2479adbd2eef4`

These keys are deterministic: re-running the pipeline with the same data and the same random seed produces identical values. Any change in the input embeddings — including model retraining with a different seed — will produce entirely different keys.

Avalanche Effect

A cryptographic key derivation function should exhibit the avalanche property: a small perturbation to the input should change approximately 50% of the output bits. Neural Vault’s derivation chain satisfies this because HKDF’s SHA-256 backbone has strong avalanche characteristics. From the model.py benchmark section:

def bit_diff(a: bytes, b: bytes) -> float:
    return (int.from_bytes(a, 'big') ^ int.from_bytes(b, 'big')).bit_count() / (len(a) * 8)

perturb = prototype_vec.copy()
perturb[0] += 0.05
avalanche_key, _ = derive_key(perturb)
avalanche_pct = bit_diff(base_key_bytes, avalanche_key) * 100

Adding 0.05 to a single component of the prototype vector — a perturbation spanning 50 quantization steps (one step = 0.001) — causes approximately 50% of the 256 output bits to flip. This confirms that the key output has no exploitable correlation with small embedding perturbations.

Keys are deterministic: the same embedding vector always produces the same 256-bit key. Any drift in the embedding — for example, caused by fMRI scanner noise, session-to-session physiological variation, or model weight changes after retraining — will produce a completely different key. Always run thresholded cosine similarity verification against the enrolled prototype before deriving a key. Only proceed to key derivation if the similarity score exceeds the EER threshold.

Overview

Getting Started

Pipeline

Benchmarking

Reference

256-bit Key Derivation from fMRI Embeddings via HKDF-SHA256

`derive_key` — main.py

`derive_key` — model.py

Step-by-Step Derivation

Step 1 — Quantize to int16

Step 2 — Serialize to bytes (IKM)

Step 3 — HKDF-SHA256

Step 4 — Fallback (no `cryptography`)

Class Prototype Computation

Benchmark Prototype Keys

Avalanche Effect

Build docs developers (and LLMs) love

Overview

Getting Started

Pipeline

Benchmarking

Reference

Documentation Index

​derive_key — main.py

​derive_key — model.py

​Step-by-Step Derivation

​Step 1 — Quantize to int16

​Step 2 — Serialize to bytes (IKM)

​Step 3 — HKDF-SHA256

​Step 4 — Fallback (no cryptography)

​Class Prototype Computation

​Benchmark Prototype Keys

​Avalanche Effect

Build docs developers (and LLMs) love

`derive_key` — main.py

`derive_key` — model.py

Step-by-Step Derivation

Step 1 — Quantize to int16

Step 2 — Serialize to bytes (IKM)

Step 3 — HKDF-SHA256

Step 4 — Fallback (no `cryptography`)

Class Prototype Computation

Benchmark Prototype Keys

Avalanche Effect