An autoencoder is a neural network that learns to compress its input into a lower-dimensional representation and then reconstruct it as accurately as possible. The first half of the network — the encoder — maps the original input down to a compact hidden state called the bottleneck. The second half — the decoder — maps that bottleneck representation back to the original space. By forcing information through a narrower channel, the network must learn the most important structure in the data. This example uses Neural Network Framework to build a binary autoencoder that compresses all 32 possible 5-bit sequences through a 3-neuron bottleneck and reconstructs them using binary cross-entropy loss.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
Network Architecture
The autoencoder is a symmetric 3-layer network:| Layer | Type | Neurons | Activation | Role |
|---|---|---|---|---|
InputLayer(5, 'tanh') | Input | 5 | Tanh | Receives 5-bit input |
HiddenLayer(3, 'tanh') | Hidden | 3 | Tanh | Bottleneck encoder |
OutputLayer(5, 'sigmoid', 'bincrossentropy') | Output | 5 | Sigmoid | Reconstructed output |
The hidden layer has only 3 neurons while the input and output each have 5. This bottleneck forces the network to learn a compressed 3-dimensional encoding of every 5-bit pattern. It cannot simply copy the input — it must discover a compact internal representation that captures all 32 distinct patterns.
Why Binary Cross-Entropy?
The target values are binary (0 or 1), and the output layer uses sigmoid activation to produce values in the range(0, 1). Binary cross-entropy loss is the natural choice here: it treats each output bit as an independent Bernoulli variable and penalizes confident wrong predictions heavily. MSE would work but converges more slowly on binary reconstruction tasks because its gradient near 0 and 1 is small.
Training Data
The dataset is every possible 5-bit combination — all 32 sequences from[0,0,0,0,0] to [1,1,1,1,1]. Because the autoencoder’s target is identical to its input, y is simply a copy of x. Training on the complete binary space means the bottleneck must generalize to all patterns simultaneously.
Create the three-layer autoencoder and initialize weights and biases with
'normal_random'. Both set_weights and set_biases must be called on every HiddenLayer and OutputLayer; omitting either leaves the corresponding attribute as None, which causes a TypeError during the forward pass.from ANN import *
import matplotlib.pyplot as plt
eta = 0.001
i = InputLayer(5, "tanh")
h1 = HiddenLayer(3, "tanh")
h1.attach_after(i)
h1.set_weights("normal_random")
h1.set_biases("normal_random")
o = OutputLayer(5, "sigmoid", "bincrossentropy")
o.attach_after(h1)
o.set_weights("normal_random")
o.set_biases("normal_random")
ANN = [i, h1, o]
The original
autoencoder.py source calls set_weights("random") and omits set_biases entirely. "random" is not a recognized method in ANN.py — the valid options are 'normal_random', 'uniform_random', 'xavier', 'he', 'lecun', and 'one'. Passing "random" matches no branch in set_weights, leaving W as None. Omitting set_biases leaves Bias as None. Both cause a TypeError when forward() computes np.dot(W, ...) + Bias. The corrected code above uses 'normal_random' for both.Define all 32 binary sequences. Both
x and y are the same array — the autoencoder’s job is to reconstruct its own input.x = np.array([
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 1, 1, 0],
[0, 0, 1, 1, 1],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 1],
[0, 1, 1, 0, 0],
[0, 1, 1, 0, 1],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 1],
[1, 0, 0, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 0],
[1, 0, 1, 0, 1],
[1, 0, 1, 1, 0],
[1, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[1, 1, 0, 0, 1],
[1, 1, 0, 1, 0],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 0],
[1, 1, 1, 1, 1],
])
y = np.array([
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 1, 1, 0],
[0, 0, 1, 1, 1],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 1],
[0, 1, 1, 0, 0],
[0, 1, 1, 0, 1],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 1],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 1],
[1, 0, 0, 1, 0],
[1, 0, 0, 1, 1],
[1, 0, 1, 0, 0],
[1, 0, 1, 0, 1],
[1, 0, 1, 1, 0],
[1, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[1, 1, 0, 0, 1],
[1, 1, 0, 1, 0],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 0],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 0],
[1, 1, 1, 1, 1]
])
This example defines its own local
gradient_descent function (not the library’s gradient_descent_epoch). The local version performs online updates — one weight update per sample per epoch — for 15 000 epochs, without accuracy tracking. The loss printed each epoch is the binary cross-entropy for the last sample processed.def gradient_descent(ANN, x, y, epochs):
loss = []
for j in range(0, epochs):
for k in range(0, len(x)):
ANN[0].put_values(x[k])
ANN[len(ANN)-1].set_actual(y[k])
for layer in ANN:
layer.forward()
for i in range(len(ANN)-1, 0, -1):
ANN[i].backward()
for i in range(1, len(ANN)):
ANN[i].W -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)
loss.append(ANN[len(ANN)-1].loss())
print(f"epoch: {j}, Loss: {ANN[len(ANN)-1].loss()}")
return ANN, loss
ANN, loss = gradient_descent(ANN, x, y, 15000)
plt.plot(loss)
plt.show()
The learning rate
eta = 0.001 is intentionally conservative. Binary cross-entropy gradients can be steep early in training, and a rate that is too high will cause the loss to oscillate or diverge.After training, run each input through the network and collect both the reconstructed output (
ANN[2].activations) and the hidden bottleneck representation (ANN[1].activations). Then plot all three side by side for every sample.def visualize_binary_sequence(sequence, title, ax, len):
ax.set_title(title)
im = ax.imshow(sequence.reshape(len, -1), cmap='binary')
plt.colorbar(im, ax=ax)
im.set_clim(0, 1)
outputs = []
hiddens = []
for j in range(0, len(x)):
ANN[0].put_values(x[j])
for layer in ANN:
layer.forward()
output = ANN[len(ANN)-1].output()
outputs.append(output)
hidden = ANN[1].activations
hiddens.append(hidden)
num_samples = len(x)
fig, axs = plt.subplots(num_samples, 3, figsize=(12, 4 * num_samples))
for j in range(num_samples):
visualize_binary_sequence(x[j], f"Input {j} - Actual: {y[j]}", axs[j][0], ANN[0].length)
visualize_binary_sequence(outputs[j], f"Predicted Output", axs[j][1], ANN[2].length)
visualize_binary_sequence(hiddens[j], f"Hidden Layer", axs[j][2], ANN[1].length)
plt.tight_layout()
plt.show()
Each row shows the original 5-bit vector (left), the reconstructed output (center), and the 3-neuron hidden layer activation — the learned compressed code (right).
The weight matrices of the hidden and output layers are the learned encoder and decoder, respectively. Plotting them as heatmaps reveals which input bits each bottleneck neuron responds to, and how the decoder maps bottleneck values back to output bits.
plt.imshow(ANN[1].W, cmap='cividis')
plt.title("Encoder Weight Matrix")
plt.show()
plt.imshow(ANN[2].W, cmap='cividis')
plt.colorbar()
plt.title("Decoder Weight Matrix")
plt.show()
Image Autoencoder Variant
The same architecture extends naturally to grayscale images. Theautoencoder_images.py variant loads a JPEG, resizes it to 20×20 pixels, and flattens it to a 400-element vector. The network dimensions scale accordingly. Unlike the binary autoencoder, this variant calls both set_weights and set_biases explicitly, using 'normal_random' for all layers:
autoencoder_images.py
autoencoder_images.py
The image autoencoder uses MSE loss and a same-size hidden layer (no compression) — its purpose is pixel-level reconstruction fidelity rather than dimensionality reduction. To add compression, reduce
HiddenLayer neurons below x.shape[1].Full Source
autoencoder.py