Training Neural Network Framework with Gradient Descent

Neural Network Framework provides two top-level training functions that implement stochastic gradient descent over a list of layer objects. Both functions handle the full training loop — feeding samples forward, propagating gradients backward, and nudging every weight and bias in the direction that reduces loss. You choose between them based on whether you want to train for a fixed number of passes over your data or stop automatically once the loss crosses a target threshold.

Training Functions

gradient_descent_epoch
gradient_descent_threshold

gradient_descent_epoch runs the training loop for an exact number of epochs. After every epoch it prints the current loss and the classification accuracy (percentage of samples whose argmax prediction matched the true label), then appends the loss to a list that is returned alongside the trained network.

Parameters

ANN

list

required

An ordered Python list of layer objects representing the full network, starting with an InputLayer and ending with an OutputLayer. For example: [InputLayer(...), HiddenLayer(...), OutputLayer(...)]. Layers must already have weights and biases initialized before calling this function.

np.ndarray

required

Input samples with shape (n_samples, n_features). Each row is one training example whose feature count must match the size of the InputLayer.

np.ndarray

required

Target labels or values with shape (n_samples, n_outputs). Each row is the expected output for the corresponding row in x. For classification tasks these are typically one-hot encoded vectors.

eta

float

required

The learning rate — a scalar that scales how large each weight update step is. Smaller values train more slowly but more stably; larger values train faster but risk divergence.

epochs

int

required

The number of full passes over the entire training dataset. In each epoch every sample in x is processed exactly once.

Returns

A tuple (ANN, loss):

ANN — the same list passed in, with all weights and biases updated in-place after training.
loss — a Python list of floats, one entry per epoch, recording the loss on the last sample of that epoch.

How it works

For each epoch the function iterates over every sample, runs a forward pass through all layers, computes the backward pass in reverse layer order, then immediately applies the weight update rule:

ANN[i].W    -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)

After all samples have been processed the epoch loss and accuracy are printed and the loss is recorded.

Example

import numpy as np
from ANN import InputLayer, HiddenLayer, OutputLayer, gradient_descent_epoch

# Build network
l0 = InputLayer(4)
l1 = HiddenLayer(8, actfn="relu")
l2 = OutputLayer(3, outputfn="softmax", lossfn="crossentropy")

l1.attach_after(l0)
l2.attach_after(l1)

l1.set_weights(method="he")
l1.set_biases(method="zeros")
l2.set_weights(method="he")
l2.set_biases(method="zeros")

ANN = [l0, l1, l2]

# Toy dataset (one-hot labels)
x_train = np.random.rand(100, 4)
y_train = np.eye(3)[np.random.choice(3, 100)]

# Train for 50 epochs
ANN, loss = gradient_descent_epoch(ANN, x_train, y_train, eta=0.01, epochs=50)

Console output looks like:

epoch: 0, Loss: 1.0986, Accuracy: 34.0
epoch: 1, Loss: 1.0842, Accuracy: 41.0
...
epoch: 49, Loss: 0.7213, Accuracy: 68.0

gradient_descent_threshold runs the training loop indefinitely until the loss on the last processed sample drops below a target threshold value. Before entering the loop the function performs an initial forward pass on the first sample (x[0], y[0]) so that ANN[-1].loss() can be evaluated for the while condition immediately. It also incorporates a simple early-stopping rule: once at least three epoch-loss values have been recorded, if the latest loss is greater than the previous one the loop breaks immediately.

Parameters

ANN

list

required

An ordered Python list of layer objects representing the full network, starting with an InputLayer and ending with an OutputLayer. Layers must already have weights and biases initialized before calling this function.

np.ndarray

required

Input samples with shape (n_samples, n_features).

np.ndarray

required

Target labels or values with shape (n_samples, n_outputs).

eta

float

required

The learning rate scalar applied to every weight and bias gradient update.

thresh

float

required

The loss threshold that terminates training. The loop continues while ANN[-1].loss() > thresh. Once the loss on the last sample falls at or below this value the loop exits cleanly.

Returns

A tuple (ANN, loss):

ANN — the trained network with updated weights.
loss — a list of per-epoch loss values recorded up to the stopping point.

Early stopping

In addition to the threshold check, the function monitors the loss list after every epoch. Once at least three epoch-loss values have been recorded (len(loss) > 2), it halts immediately if the latest loss is greater than the previous one:

if len(loss) > 2 and loss[-1] > loss[-2]:
    break

This guard requires three data points before triggering so that a single noisy epoch does not stop training prematurely. It prevents wasted computation when the optimizer has already passed a local minimum and the loss is climbing.

gradient_descent_threshold performs an initial forward pass on x[0] / y[0] before the while loop so the loss threshold can be checked before any gradient updates. It will also stop training early if the loss increases between consecutive epochs (after at least three epochs have been recorded), even if the threshold has not yet been reached. Always inspect the returned loss list to verify why training stopped.

Example

import numpy as np
from ANN import InputLayer, HiddenLayer, OutputLayer, gradient_descent_threshold

# Build network
l0 = InputLayer(2)
l1 = HiddenLayer(4, actfn="sigmoid")
l2 = OutputLayer(1, outputfn="sigmoid", lossfn="bincrossentropy")

l1.attach_after(l0)
l2.attach_after(l1)

l1.set_weights(method="xavier")
l1.set_biases(method="zeros")
l2.set_weights(method="xavier")
l2.set_biases(method="zeros")

ANN = [l0, l1, l2]

# XOR-style binary dataset
x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=float)
y_train = np.array([[0], [1], [1], [0]], dtype=float)

# Train until loss drops below 0.05
ANN, loss = gradient_descent_threshold(ANN, x_train, y_train, eta=0.1, thresh=0.05)

print(f"Stopped after {len(loss)} epochs. Final loss: {loss[-1]:.4f}")

Weight Update Rule

Both training functions apply the same parameter update rule after each sample’s backward pass. Weights are shifted opposite to the gradient of the loss with respect to those weights, scaled by the learning rate:

ANN[i].W    -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)

dLdW holds the gradient of the loss with respect to the weight matrix of layer i, and dLda holds the gradient with respect to the pre-activation values (which is also the gradient with respect to the bias vector after reshaping). Both are computed and stored on each layer object during the backward pass. See Backpropagation for a detailed explanation of how these gradients are derived.

Choosing a good learning rate is the single most impactful hyperparameter decision when training. Start with a value in the range 0.001 – 0.01. If loss oscillates wildly or increases, reduce eta by a factor of 10. If loss decreases very slowly and training feels sluggish, try increasing it. For networks using relu activations, He initialization paired with eta=0.001 is a reliable baseline.

Plotting the Loss Curve

Both functions return a loss list that you can plot directly to visualise how the training progressed over time.

import matplotlib.pyplot as plt

ANN, loss = gradient_descent_epoch(ANN, x_train, y_train, eta=0.01, epochs=100)

plt.plot(loss)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.show()

A smoothly decreasing curve indicates healthy training. A flat curve suggests the learning rate is too small or the network architecture needs revision. A curve that spikes upward after initially decreasing is a sign to lower eta or inspect your data for outliers.

Get Started

Core Concepts

Training

Examples

Training Neural Network Framework with Gradient Descent

Training Functions

Parameters

Returns

How it works

Example

Parameters

Returns

Early stopping

Example

Weight Update Rule

Plotting the Loss Curve

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Examples

Documentation Index

​Training Functions

​Parameters

​Returns

​How it works

​Example

​Parameters

​Returns

​Early stopping

​Example

​Weight Update Rule

​Plotting the Loss Curve

Build docs developers (and LLMs) love

Training Functions

Parameters

Returns

How it works

Example

Parameters

Returns

Early stopping

Example

Weight Update Rule

Plotting the Loss Curve