The XOR problem is one of the most iconic benchmarks in neural network history. A single-layer perceptron cannot solve it because XOR is not linearly separable — no single straight line can correctly divide the four input combinations into the right classes. Adding one hidden layer gives the network the expressive power it needs to learn the non-linear decision boundary, making XOR an ideal first test for any feedforward architecture. This walkthrough uses Neural Network Framework to build, train, and evaluate a minimal 3-layer network that learns XOR from scratch using vanilla gradient descent.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
Network Architecture
The network has three layers arranged in a chain:| Layer | Type | Neurons | Activation |
|---|---|---|---|
InputLayer(2, 'sigmoid') | Input | 2 | Sigmoid |
HiddenLayer(2, 'sigmoid') | Hidden | 2 | Sigmoid |
OutputLayer(1, 'none', 'MSE') | Output | 1 | None (linear) |
Training Data
The full XOR truth table serves as the training set — all four possible input combinations and their expected outputs:| Input A | Input B | XOR Output |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Create an
InputLayer with 2 neurons and sigmoid activation, then attach a HiddenLayer with 2 neurons and sigmoid activation. Attach the OutputLayer with 1 neuron, linear activation, and MSE loss. Initialize weights with 'normal_random' (standard normal distribution) and biases with 'normal_random' for both hidden and output layers. Collect the layers in a list to form the ANN pipeline.from ANN import *
eta = 0.1
i = InputLayer(2, "sigmoid")
h1 = HiddenLayer(2, "sigmoid")
h1.attach_after(i)
h1.set_weights("normal_random")
h1.set_biases("normal_random")
o = OutputLayer(1, "none", "MSE")
o.attach_after(h1)
o.set_weights("normal_random")
o.set_biases("normal_random")
ANN = [i, h1, o]
The original
Xor.py source calls set_weights("random") for both layers and omits set_biases entirely. "random" is not a recognized method in ANN.py — the valid options are 'normal_random', 'uniform_random', 'xavier', 'he', 'lecun', and 'one'. Passing "random" matches no branch, leaving W as None. Similarly, skipping set_biases leaves Bias as None. Both conditions cause a TypeError during the forward pass when NumPy tries to compute np.dot(None, ...) + None. The corrected code above uses 'normal_random' for both weights and biases.attach_after sets the previous pointer on the new layer and the next pointer on the preceding one — both are required for the forward and backward passes to chain correctly.Define the four XOR input–output pairs as NumPy arrays. The inputs
x are shape (4, 2) and the targets y are shape (4, 1).This example defines its own local
gradient_descent function rather than using the library’s gradient_descent_epoch. The local version is simpler — it does not track per-epoch accuracy. It iterates for a fixed number of epochs, and each epoch loops over every sample: it runs a full forward pass through ANN, then a backward pass from the last layer to the first, and finally updates every weight matrix W and bias vector Bias by the gradient scaled with the learning rate eta.def gradient_descent(ANN, x, y, epochs):
loss = []
for j in range(0, epochs):
for k in range(0, len(x)):
ANN[0].put_values(x[k])
ANN[len(ANN)-1].set_actual(y[k])
for layer in ANN:
layer.forward()
for i in range(len(ANN)-1, 0, -1):
ANN[i].backward()
for i in range(1, len(ANN)):
ANN[i].W -= eta * ANN[i].dLdW
ANN[i].Bias -= eta * ANN[i].dLda.reshape(1, -1)
loss.append(ANN[len(ANN)-1].loss())
print(f"epoch: {j}, Loss: {ANN[len(ANN)-1].loss()}")
return ANN, loss
ANN, loss = gradient_descent(ANN, x, y, 50000)
The training loop runs for 50 000 epochs. With a learning rate of
0.1 and only four samples, this typically converges well within that budget.A healthy XOR loss curve drops steeply in the first few thousand epochs and then flattens out close to zero. If the curve plateaus at a high value, re-run — random weight initialization can occasionally start in a poor basin.
Feed each of the four inputs back through the trained network and print the predicted output alongside the known ground truth:
Expected Results
After approximately 50 000 epochs the network should produce outputs very close to the ground truth:Full Source
xor.py