MNIST Digit Classification with Neural Network Framework

MNIST is the standard benchmark for handwritten digit recognition: 60 000 training images and 10 000 test images, each a 28×28 grayscale scan of a digit from 0 to 9. This walkthrough uses Neural Network Framework to build a 4-layer feedforward network that classifies these digits using ReLU hidden layers, a softmax output, and categorical cross-entropy loss — all running entirely on NumPy. TensorFlow is used only as a convenient data loader; it plays no part in the forward pass, backpropagation, or weight updates.

TensorFlow is only required to download and load the MNIST dataset via tf.keras.datasets.mnist. Once the images and labels are loaded into NumPy arrays, every subsequent operation — forward pass, backward pass, gradient updates — runs on pure NumPy through Neural Network Framework.

Network Architecture

The network has four layers:

Layer	Type	Neurons	Activation	Init method
`InputLayer(784, 'none')`	Input	784	None (linear passthrough)	—
`HiddenLayer(10, 'relu')`	Hidden	10	ReLU	`uniform_random`
`HiddenLayer(10, 'relu')`	Hidden	10	ReLU	`uniform_random`
`OutputLayer(10, 'softmax', 'crossentropy')`	Output	10	Softmax	`uniform_random`

Each 28×28 image is flattened to a 784-element vector before being fed into the input layer. The two hidden layers use ReLU to introduce non-linearity without the saturation issues of sigmoid. The output layer applies softmax across 10 neurons (one per digit class) so that the outputs form a valid probability distribution. Categorical cross-entropy measures how far the predicted distribution is from the one-hot ground truth label.

Training Configuration

Parameter	Value
Training samples	500 (first 500 of 60 000)
Test samples	100
Epochs	1 000
Learning rate (`eta`)	0.0001
Optimizer	Vanilla gradient descent (online, per-sample)

Training for 1 000 epochs over 500 samples performs 500 000 forward/backward passes in Python. On a CPU without vectorized batching this may take several minutes. The small learning rate 0.0001 is necessary to keep training stable with uniform random initialization and softmax/cross-entropy.

Load and preprocess MNIST data

Use tf.keras.datasets.mnist to download the dataset. Normalize pixel values from [0, 255] to [0.0, 1.0] by dividing by 255. Slice the first 500 training images and one-hot encode all labels into 10-class vectors using to_categorical.

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
from ANN import *

mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]

train_images = train_images / 255.0
test_images = test_images / 255.0

train_images_resized = train_images
test_images_resized = test_images

train_images = train_images_resized.reshape(train_images.shape[0], -1)
test_images = test_images_resized.reshape(test_images.shape[0], -1)

train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)

After reshaping, train_images has shape (500, 784) and train_labels has shape (500, 10). The test set remains the full 10 000 images but only the first 100 are used for evaluation.

Build the 4-layer network

Create each layer in order and chain them with attach_after. Use uniform_random initialization for weights and biases throughout — this keeps all initial values positive and in [0, 1), which works well for ReLU-based networks.

eta = 0.0001

i = InputLayer(train_images.shape[1], "none")

h1 = HiddenLayer(10, "relu")
h1.attach_after(i)
h1.set_weights("uniform_random")
h1.set_biases("uniform_random")

h2 = HiddenLayer(10, "relu")
h2.attach_after(h1)
h2.set_weights("uniform_random")
h2.set_biases("uniform_random")

o = OutputLayer(train_labels.shape[1], "softmax", "crossentropy")
o.attach_after(h2)
o.set_weights("uniform_random")
o.set_biases("uniform_random")

ANN = [i, h1, h2, o]

train_images.shape[1] evaluates to 784 and train_labels.shape[1] evaluates to 10. Using shape attributes instead of hard-coded literals makes the architecture automatically adapt if you change the image resolution or number of classes.

Train with gradient_descent_epoch

Call the built-in gradient_descent_epoch function from ANN.py. Unlike the local gradient_descent helper used in the XOR and autoencoder examples, this function also tracks per-epoch accuracy by comparing argmax of the network output against the true label index.

ANN, loss = gradient_descent_epoch(ANN, train_images, train_labels, eta, 1000)
plt.plot(loss)
plt.show()

Each epoch prints the cross-entropy loss for the last sample and the classification accuracy over all 500 training samples:

epoch: 0, Loss: 2.302..., Accuracy: 10.0
epoch: 1, Loss: 2.301..., Accuracy: 12.0
...
epoch: 999, Loss: 1.847..., Accuracy: 54.0

Accuracy climbs gradually from near-random (~10%) as the network learns to distinguish digit features in the high-dimensional pixel space.

Evaluate on test samples

The test_network function runs inference on a specified number of test images. For each sample it prints the predicted and actual digit label, then reports overall accuracy at the end.

def test_network(ANN, x_test, y_test, num_samples=20):
    correct_predictions = 0
    for i in range(num_samples):
        input_data = x_test[i]
        actual_label = np.argmax(y_test[i])

        ANN[0].put_values(input_data)
        for layer in ANN:
            layer.forward()

        output = ANN[-1].output()
        predicted_label = np.argmax(output)

        if predicted_label == actual_label:
            correct_predictions += 1

        print(f"Sample {i + 1}: Predicted Label - {predicted_label}, Actual Label - {actual_label}")

    accuracy = correct_predictions / num_samples * 100.0
    print(f"\nAccuracy on {num_samples} test samples: {accuracy:.2f}%")

test_network(ANN, test_images, test_labels, num_samples=100)

Sample output:

Sample 1: Predicted Label - 7, Actual Label - 7
Sample 2: Predicted Label - 2, Actual Label - 2
Sample 3: Predicted Label - 1, Actual Label - 1
...
Accuracy on 100 test samples: 52.00%

Accuracy on the test set after 1 000 epochs with 500 training samples is modest — typically in the 40–60% range. This is expected: the network is deliberately small (10 neurons per hidden layer) and trained on less than 1% of the available data. Increasing the number of training samples, epochs, or hidden layer width will improve accuracy significantly.

Full Source

mnist.py

#import here
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
from ANN import *

mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images[:500]
train_labels = train_labels[:500]


train_images = train_images / 255.0
test_images = test_images / 255.0

def resize_images(images):
    resized_images = []
    for img in images:
        pil_img = Image.fromarray(img)
        resized_img = pil_img.resize((20, 20), Image.BILINEAR)
        resized_images.append(np.array(resized_img))
    return np.array(resized_images)

train_images_resized = train_images
test_images_resized = test_images


train_images = train_images_resized.reshape(train_images.shape[0], -1)
test_images = test_images_resized.reshape(test_images.shape[0], -1)

train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)


eta = 0.0001

i = InputLayer(train_images.shape[1], "none")

h1 = HiddenLayer(10, "relu")
h1.attach_after(i)
h1.set_weights("uniform_random")
h1.set_biases("uniform_random")

h2 = HiddenLayer(10, "relu")
h2.attach_after(h1)
h2.set_weights("uniform_random")
h2.set_biases("uniform_random")

o = OutputLayer(train_labels.shape[1], "softmax", "crossentropy")
o.attach_after(h2)
o.set_weights("uniform_random")
o.set_biases("uniform_random")

ANN = [i, h1, h2, o]


ANN, loss = gradient_descent_epoch(ANN, train_images, train_labels, eta, 1000)
plt.plot(loss)
plt.show()

def test_network(ANN, x_test, y_test, num_samples=20):
    correct_predictions = 0
    for i in range(num_samples):
        input_data = x_test[i]
        actual_label = np.argmax(y_test[i])

        ANN[0].put_values(input_data)
        for layer in ANN:
            layer.forward()

        output = ANN[-1].output()
        predicted_label = np.argmax(output)

        if predicted_label == actual_label:
            correct_predictions += 1

        print(f"Sample {i + 1}: Predicted Label - {predicted_label}, Actual Label - {actual_label}")

    accuracy = correct_predictions / num_samples * 100.0
    print(f"\nAccuracy on {num_samples} test samples: {accuracy:.2f}%")

test_network(ANN, test_images, test_labels, num_samples=100)

Get Started

Core Concepts

Training

Examples

MNIST Digit Classification with Neural Network Framework

Network Architecture

Training Configuration

Full Source

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Examples

Documentation Index

​Network Architecture

​Training Configuration

​Full Source

Build docs developers (and LLMs) love

Network Architecture

Training Configuration

Full Source