Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
Neural Network Framework provides two top-level training functions that implement stochastic gradient descent over a list of layer objects. Both functions handle the full training loop — feeding samples forward, propagating gradients backward, and nudging every weight and bias in the direction that reduces loss. You choose between them based on whether you want to train for a fixed number of passes over your data or stop automatically once the loss crosses a target threshold.
Training Functions
gradient_descent_epoch runs the training loop for an exact number of epochs. After every epoch it prints the current loss and the classification accuracy (percentage of samples whose argmax prediction matched the true label), then appends the loss to a list that is returned alongside the trained network.Parameters
An ordered Python list of layer objects representing the full network, starting with an InputLayer and ending with an OutputLayer. For example: [InputLayer(...), HiddenLayer(...), OutputLayer(...)]. Layers must already have weights and biases initialized before calling this function.
Input samples with shape (n_samples, n_features). Each row is one training example whose feature count must match the size of the InputLayer.
Target labels or values with shape (n_samples, n_outputs). Each row is the expected output for the corresponding row in x. For classification tasks these are typically one-hot encoded vectors.
The learning rate — a scalar that scales how large each weight update step is. Smaller values train more slowly but more stably; larger values train faster but risk divergence.
The number of full passes over the entire training dataset. In each epoch every sample in x is processed exactly once.
Returns
A tuple (ANN, loss):
ANN — the same list passed in, with all weights and biases updated in-place after training.
loss — a Python list of floats, one entry per epoch, recording the loss on the last sample of that epoch.
How it works
For each epoch the function iterates over every sample, runs a forward pass through all layers, computes the backward pass in reverse layer order, then immediately applies the weight update rule:ANN[i].W -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)
After all samples have been processed the epoch loss and accuracy are printed and the loss is recorded.Example
import numpy as np
from ANN import InputLayer, HiddenLayer, OutputLayer, gradient_descent_epoch
# Build network
l0 = InputLayer(4)
l1 = HiddenLayer(8, actfn="relu")
l2 = OutputLayer(3, outputfn="softmax", lossfn="crossentropy")
l1.attach_after(l0)
l2.attach_after(l1)
l1.set_weights(method="he")
l1.set_biases(method="zeros")
l2.set_weights(method="he")
l2.set_biases(method="zeros")
ANN = [l0, l1, l2]
# Toy dataset (one-hot labels)
x_train = np.random.rand(100, 4)
y_train = np.eye(3)[np.random.choice(3, 100)]
# Train for 50 epochs
ANN, loss = gradient_descent_epoch(ANN, x_train, y_train, eta=0.01, epochs=50)
Console output looks like:epoch: 0, Loss: 1.0986, Accuracy: 34.0
epoch: 1, Loss: 1.0842, Accuracy: 41.0
...
epoch: 49, Loss: 0.7213, Accuracy: 68.0
gradient_descent_threshold runs the training loop indefinitely until the loss on the last processed sample drops below a target threshold value. Before entering the loop the function performs an initial forward pass on the first sample (x[0], y[0]) so that ANN[-1].loss() can be evaluated for the while condition immediately. It also incorporates a simple early-stopping rule: once at least three epoch-loss values have been recorded, if the latest loss is greater than the previous one the loop breaks immediately.Parameters
An ordered Python list of layer objects representing the full network, starting with an InputLayer and ending with an OutputLayer. Layers must already have weights and biases initialized before calling this function.
Input samples with shape (n_samples, n_features).
Target labels or values with shape (n_samples, n_outputs).
The learning rate scalar applied to every weight and bias gradient update.
The loss threshold that terminates training. The loop continues while ANN[-1].loss() > thresh. Once the loss on the last sample falls at or below this value the loop exits cleanly.
Returns
A tuple (ANN, loss):
ANN — the trained network with updated weights.
loss — a list of per-epoch loss values recorded up to the stopping point.
Early stopping
In addition to the threshold check, the function monitors the loss list after every epoch. Once at least three epoch-loss values have been recorded (len(loss) > 2), it halts immediately if the latest loss is greater than the previous one:if len(loss) > 2 and loss[-1] > loss[-2]:
break
This guard requires three data points before triggering so that a single noisy epoch does not stop training prematurely. It prevents wasted computation when the optimizer has already passed a local minimum and the loss is climbing.gradient_descent_threshold performs an initial forward pass on x[0] / y[0] before the while loop so the loss threshold can be checked before any gradient updates. It will also stop training early if the loss increases between consecutive epochs (after at least three epochs have been recorded), even if the threshold has not yet been reached. Always inspect the returned loss list to verify why training stopped.
Example
import numpy as np
from ANN import InputLayer, HiddenLayer, OutputLayer, gradient_descent_threshold
# Build network
l0 = InputLayer(2)
l1 = HiddenLayer(4, actfn="sigmoid")
l2 = OutputLayer(1, outputfn="sigmoid", lossfn="bincrossentropy")
l1.attach_after(l0)
l2.attach_after(l1)
l1.set_weights(method="xavier")
l1.set_biases(method="zeros")
l2.set_weights(method="xavier")
l2.set_biases(method="zeros")
ANN = [l0, l1, l2]
# XOR-style binary dataset
x_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=float)
y_train = np.array([[0], [1], [1], [0]], dtype=float)
# Train until loss drops below 0.05
ANN, loss = gradient_descent_threshold(ANN, x_train, y_train, eta=0.1, thresh=0.05)
print(f"Stopped after {len(loss)} epochs. Final loss: {loss[-1]:.4f}")
Weight Update Rule
Both training functions apply the same parameter update rule after each sample’s backward pass. Weights are shifted opposite to the gradient of the loss with respect to those weights, scaled by the learning rate:
ANN[i].W -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)
dLdW holds the gradient of the loss with respect to the weight matrix of layer i, and dLda holds the gradient with respect to the pre-activation values (which is also the gradient with respect to the bias vector after reshaping). Both are computed and stored on each layer object during the backward pass. See Backpropagation for a detailed explanation of how these gradients are derived.
Choosing a good learning rate is the single most impactful hyperparameter decision when training. Start with a value in the range 0.001 – 0.01. If loss oscillates wildly or increases, reduce eta by a factor of 10. If loss decreases very slowly and training feels sluggish, try increasing it. For networks using relu activations, He initialization paired with eta=0.001 is a reliable baseline.
Plotting the Loss Curve
Both functions return a loss list that you can plot directly to visualise how the training progressed over time.
import matplotlib.pyplot as plt
ANN, loss = gradient_descent_epoch(ANN, x_train, y_train, eta=0.01, epochs=100)
plt.plot(loss)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.show()
A smoothly decreasing curve indicates healthy training. A flat curve suggests the learning rate is too small or the network architecture needs revision. A curve that spikes upward after initially decreasing is a sign to lower eta or inspect your data for outliers.