MNIST is the standard benchmark for handwritten digit recognition: 60 000 training images and 10 000 test images, each a 28×28 grayscale scan of a digit from 0 to 9. This walkthrough uses Neural Network Framework to build a 4-layer feedforward network that classifies these digits using ReLU hidden layers, a softmax output, and categorical cross-entropy loss — all running entirely on NumPy. TensorFlow is used only as a convenient data loader; it plays no part in the forward pass, backpropagation, or weight updates.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
TensorFlow is only required to download and load the MNIST dataset via
tf.keras.datasets.mnist. Once the images and labels are loaded into NumPy arrays, every subsequent operation — forward pass, backward pass, gradient updates — runs on pure NumPy through Neural Network Framework.Network Architecture
The network has four layers:| Layer | Type | Neurons | Activation | Init method |
|---|---|---|---|---|
InputLayer(784, 'none') | Input | 784 | None (linear passthrough) | — |
HiddenLayer(10, 'relu') | Hidden | 10 | ReLU | uniform_random |
HiddenLayer(10, 'relu') | Hidden | 10 | ReLU | uniform_random |
OutputLayer(10, 'softmax', 'crossentropy') | Output | 10 | Softmax | uniform_random |
Training Configuration
| Parameter | Value |
|---|---|
| Training samples | 500 (first 500 of 60 000) |
| Test samples | 100 |
| Epochs | 1 000 |
Learning rate (eta) | 0.0001 |
| Optimizer | Vanilla gradient descent (online, per-sample) |
Use
tf.keras.datasets.mnist to download the dataset. Normalize pixel values from [0, 255] to [0.0, 1.0] by dividing by 255. Slice the first 500 training images and one-hot encode all labels into 10-class vectors using to_categorical.import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
from ANN import *
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]
train_images = train_images / 255.0
test_images = test_images / 255.0
train_images_resized = train_images
test_images_resized = test_images
train_images = train_images_resized.reshape(train_images.shape[0], -1)
test_images = test_images_resized.reshape(test_images.shape[0], -1)
train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)
After reshaping,
train_images has shape (500, 784) and train_labels has shape (500, 10). The test set remains the full 10 000 images but only the first 100 are used for evaluation.Create each layer in order and chain them with
attach_after. Use uniform_random initialization for weights and biases throughout — this keeps all initial values positive and in [0, 1), which works well for ReLU-based networks.eta = 0.0001
i = InputLayer(train_images.shape[1], "none")
h1 = HiddenLayer(10, "relu")
h1.attach_after(i)
h1.set_weights("uniform_random")
h1.set_biases("uniform_random")
h2 = HiddenLayer(10, "relu")
h2.attach_after(h1)
h2.set_weights("uniform_random")
h2.set_biases("uniform_random")
o = OutputLayer(train_labels.shape[1], "softmax", "crossentropy")
o.attach_after(h2)
o.set_weights("uniform_random")
o.set_biases("uniform_random")
ANN = [i, h1, h2, o]
train_images.shape[1] evaluates to 784 and train_labels.shape[1] evaluates to 10. Using shape attributes instead of hard-coded literals makes the architecture automatically adapt if you change the image resolution or number of classes.Call the built-in
gradient_descent_epoch function from ANN.py. Unlike the local gradient_descent helper used in the XOR and autoencoder examples, this function also tracks per-epoch accuracy by comparing argmax of the network output against the true label index.ANN, loss = gradient_descent_epoch(ANN, train_images, train_labels, eta, 1000)
plt.plot(loss)
plt.show()
Each epoch prints the cross-entropy loss for the last sample and the classification accuracy over all 500 training samples:
epoch: 0, Loss: 2.302..., Accuracy: 10.0
epoch: 1, Loss: 2.301..., Accuracy: 12.0
...
epoch: 999, Loss: 1.847..., Accuracy: 54.0
Accuracy climbs gradually from near-random (~10%) as the network learns to distinguish digit features in the high-dimensional pixel space.
The
test_network function runs inference on a specified number of test images. For each sample it prints the predicted and actual digit label, then reports overall accuracy at the end.def test_network(ANN, x_test, y_test, num_samples=20):
correct_predictions = 0
for i in range(num_samples):
input_data = x_test[i]
actual_label = np.argmax(y_test[i])
ANN[0].put_values(input_data)
for layer in ANN:
layer.forward()
output = ANN[-1].output()
predicted_label = np.argmax(output)
if predicted_label == actual_label:
correct_predictions += 1
print(f"Sample {i + 1}: Predicted Label - {predicted_label}, Actual Label - {actual_label}")
accuracy = correct_predictions / num_samples * 100.0
print(f"\nAccuracy on {num_samples} test samples: {accuracy:.2f}%")
test_network(ANN, test_images, test_labels, num_samples=100)
Sample 1: Predicted Label - 7, Actual Label - 7
Sample 2: Predicted Label - 2, Actual Label - 2
Sample 3: Predicted Label - 1, Actual Label - 1
...
Accuracy on 100 test samples: 52.00%
Accuracy on the test set after 1 000 epochs with 500 training samples is modest — typically in the 40–60% range. This is expected: the network is deliberately small (10 neurons per hidden layer) and trained on less than 1% of the available data. Increasing the number of training samples, epochs, or hidden layer width will improve accuracy significantly.
Full Source
mnist.py