A loss function measures how far the network’s predictions are from the ground-truth labels during training. Neural Network Framework provides three built-in loss functions, selected via theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
lossfn argument of OutputLayer. The chosen loss determines both the scalar value returned by OutputLayer.loss() and the derivative used as the starting signal for backpropagation. Picking the right loss — and pairing it with the correct output activation — is essential for stable, meaningful training.
Always pair your loss function with the appropriate output activation:
Mismatched pairings will produce incorrect gradients without raising an error.
| Loss key | Output activation | Task |
|---|---|---|
'MSE' | 'none' or 'sigmoid' | Regression |
'bincrossentropy' | 'sigmoid' | Binary classification |
'crossentropy' | 'softmax' | Multi-class classification |
Mean Squared Error (MSE)
MSE is the classic regression loss. It penalises large deviations quadratically, making it sensitive to outliers. Key:'MSE'
Constructor usage
n is the number of output neurons, ŷ_i is the predicted value, and y_i is the ground truth.
OutputLayer.loss() return value: A single Python float — the average squared error across all output neurons for the current sample.
Derivative used in backprop
The scalar gradient accumulated over all outputs:
The MSE derivative returns a scalar (summed over outputs), unlike the vector derivatives of the cross-entropy losses. This is intentional for the current implementation and works correctly when combined with the chain-rule computation in
OutputLayer.backward().Binary Cross-Entropy
Binary Cross-Entropy (BCE) is the standard loss for binary classification problems, where the output layer has a single neuron with a sigmoid activation representing the probability of the positive class. Key:'bincrossentropy'
Constructor usage
epsilon = 1e-10 clamp is applied to ŷ_i before taking logarithms to prevent log(0):
OutputLayer.loss() return value: A single float — the average BCE over all output neurons.
Derivative used in backprop
(n, 1):
The
epsilon = 1e-10 floor in both the loss and derivative guards against division by zero when a neuron’s output saturates to exactly 0 or 1. This can occur with Sigmoid when pre-activations are very large in magnitude.Categorical Cross-Entropy
Categorical Cross-Entropy is designed for multi-class classification problems where labels are one-hot encoded (exactly one element of theactual array equals 1, all others are 0).
Key: 'crossentropy'
Constructor usage
c is the index of the true class (the index where actual[c] == 1).
OutputLayer.loss() return value: A single float — the negative log-probability assigned to the true class.
Derivative used in backprop
For all classes other than the true class the gradient is zero; for the true class:
Comparison Summary
MSE
Task: Regression
Output activation:
Gradient shape: scalar
Sensitive to outliers: yes (quadratic penalty)
Output activation:
'none'Gradient shape: scalar
Sensitive to outliers: yes (quadratic penalty)
Binary Cross-Entropy
Task: Binary classification
Output activation:
Gradient shape:
Epsilon clipping: yes (
Output activation:
'sigmoid'Gradient shape:
(n, 1) vectorEpsilon clipping: yes (
1e-10)Categorical Cross-Entropy
Task: Multi-class classification
Output activation:
Gradient shape:
Requires one-hot labels: yes
Output activation:
'softmax'Gradient shape:
(n, 1) vectorRequires one-hot labels: yes