Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt

Use this file to discover all available pages before exploring further.

Neural Network Framework provides two top-level training functions that implement stochastic gradient descent over a list of layer objects. Both functions handle the full training loop — feeding samples forward, propagating gradients backward, and nudging every weight and bias in the direction that reduces loss. You choose between them based on whether you want to train for a fixed number of passes over your data or stop automatically once the loss crosses a target threshold.

Training Functions

gradient_descent_epoch runs the training loop for an exact number of epochs. After every epoch it prints the current loss and the classification accuracy (percentage of samples whose argmax prediction matched the true label), then appends the loss to a list that is returned alongside the trained network.

Parameters

ANN
list
required
An ordered Python list of layer objects representing the full network, starting with an InputLayer and ending with an OutputLayer. For example: [InputLayer(...), HiddenLayer(...), OutputLayer(...)]. Layers must already have weights and biases initialized before calling this function.
x
np.ndarray
required
Input samples with shape (n_samples, n_features). Each row is one training example whose feature count must match the size of the InputLayer.
y
np.ndarray
required
Target labels or values with shape (n_samples, n_outputs). Each row is the expected output for the corresponding row in x. For classification tasks these are typically one-hot encoded vectors.
eta
float
required
The learning rate — a scalar that scales how large each weight update step is. Smaller values train more slowly but more stably; larger values train faster but risk divergence.
epochs
int
required
The number of full passes over the entire training dataset. In each epoch every sample in x is processed exactly once.

Returns

A tuple (ANN, loss):
  • ANN — the same list passed in, with all weights and biases updated in-place after training.
  • loss — a Python list of floats, one entry per epoch, recording the loss on the last sample of that epoch.

How it works

For each epoch the function iterates over every sample, runs a forward pass through all layers, computes the backward pass in reverse layer order, then immediately applies the weight update rule:
ANN[i].W    -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)
After all samples have been processed the epoch loss and accuracy are printed and the loss is recorded.

Example

import numpy as np
from ANN import InputLayer, HiddenLayer, OutputLayer, gradient_descent_epoch

# Build network
l0 = InputLayer(4)
l1 = HiddenLayer(8, actfn="relu")
l2 = OutputLayer(3, outputfn="softmax", lossfn="crossentropy")

l1.attach_after(l0)
l2.attach_after(l1)

l1.set_weights(method="he")
l1.set_biases(method="zeros")
l2.set_weights(method="he")
l2.set_biases(method="zeros")

ANN = [l0, l1, l2]

# Toy dataset (one-hot labels)
x_train = np.random.rand(100, 4)
y_train = np.eye(3)[np.random.choice(3, 100)]

# Train for 50 epochs
ANN, loss = gradient_descent_epoch(ANN, x_train, y_train, eta=0.01, epochs=50)
Console output looks like:
epoch: 0, Loss: 1.0986, Accuracy: 34.0
epoch: 1, Loss: 1.0842, Accuracy: 41.0
...
epoch: 49, Loss: 0.7213, Accuracy: 68.0

Weight Update Rule

Both training functions apply the same parameter update rule after each sample’s backward pass. Weights are shifted opposite to the gradient of the loss with respect to those weights, scaled by the learning rate:
ANN[i].W    -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)
dLdW holds the gradient of the loss with respect to the weight matrix of layer i, and dLda holds the gradient with respect to the pre-activation values (which is also the gradient with respect to the bias vector after reshaping). Both are computed and stored on each layer object during the backward pass. See Backpropagation for a detailed explanation of how these gradients are derived.
Choosing a good learning rate is the single most impactful hyperparameter decision when training. Start with a value in the range 0.0010.01. If loss oscillates wildly or increases, reduce eta by a factor of 10. If loss decreases very slowly and training feels sluggish, try increasing it. For networks using relu activations, He initialization paired with eta=0.001 is a reliable baseline.

Plotting the Loss Curve

Both functions return a loss list that you can plot directly to visualise how the training progressed over time.
import matplotlib.pyplot as plt

ANN, loss = gradient_descent_epoch(ANN, x_train, y_train, eta=0.01, epochs=100)

plt.plot(loss)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.show()
A smoothly decreasing curve indicates healthy training. A flat curve suggests the learning rate is too small or the network architecture needs revision. A curve that spikes upward after initially decreasing is a sign to lower eta or inspect your data for outliers.

Build docs developers (and LLMs) love