Weight and bias initialisation is one of the most consequential decisions when building a neural network. Poor initialisation can cause gradients to vanish or explode during the very first backpropagation step, leaving the network unable to learn. Neural Network Framework exposes a set of principled initialisation strategies through theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
set_weights(method) and set_biases(method) methods available on HiddenLayer and OutputLayer. This page lists every supported method string, the formula or distribution it uses, and guidance on when to apply it.
InputLayer does not own a weight matrix or bias vector and therefore has no set_weights or set_biases method. Calling either on an InputLayer instance will raise an AttributeError.Weight Initialisation Methods
Calllayer.set_weights(method) after layer.attach_after(prev_layer) has been called. The matrix shape is always (self.length, self.previous.length) — that is, (n_out, n_in).
'normal_random' — Standard Normal
'normal_random' — Standard Normal
Draws each weight from a standard normal distribution with mean 0 and variance 1.When to use: A quick baseline. Not recommended for deep networks — the variance grows with
n_in, making deep layers prone to exploding activations.'uniform_random' — Uniform [0, 1)
'uniform_random' — Uniform [0, 1)
Draws each weight uniformly from the half-open interval When to use: Rarely preferred over
[0, 1).'normal_random'; note that all weights are positive, which can introduce symmetry bias.'xavier' — Xavier / Glorot Normal
'xavier' — Xavier / Glorot Normal
Scales a standard-normal draw by where
1 / sqrt(n_out). This keeps activation variances roughly constant across layers for Sigmoid and Tanh activations.self.length is n_out.When to use: 'sigmoid' and 'tanh' hidden layers. The standard Glorot formula scales by sqrt(2 / (n_in + n_out)); this implementation uses 1 / sqrt(n_out) — equivalent in the limit but slightly different numerically.'he' — He / Kaiming Normal
'he' — He / Kaiming Normal
Scales a standard-normal draw by a factor derived from the layer dimensions. Designed specifically for ReLU-family activations, where half the neurons are zeroed out on average.The exact formula differs between layer types:When to use:
HiddenLayer — divides by the product of input and output sizes:OutputLayer — divides only by input size:'relu' hidden and output layers. The canonical He formula is sqrt(2 / n_in); HiddenLayer uses the more conservative sqrt(2 / (n_in * n_out)) which produces smaller initial weights.'lecun' — LeCun Uniform
'lecun' — LeCun Uniform
Draws weights uniformly from When to use: Networks with
[−limit, limit] where limit = sqrt(1 / n_in).'tanh' activations; historically recommended for LeNet-style architectures. Also a reasonable choice for 'sigmoid'.'one' — All Ones
'one' — All Ones
Fills the entire weight matrix with 1.0.When to use: Debugging and unit tests only. A constant weight matrix means every neuron in the layer computes the same pre-activation and will receive identical gradients, permanently breaking symmetry recovery.
Bias Initialisation Methods
Calllayer.set_biases(method) to allocate and fill the bias vector of shape (1, n). All methods are available on both HiddenLayer and OutputLayer.
'normal_random'
'normal_random'
'uniform_random'
'uniform_random'
[0, 1).'zeros'
'zeros'
'constant'
'constant'
0.1. Can help activate ReLU neurons during the initial forward passes.'xavier'
'xavier'
sqrt(1 / n). Matches the spirit of Xavier weight initialisation.'lecun'
'lecun'
sqrt(1 / n). Same formula as 'xavier' biases in this implementation.'he'
'he'
sqrt(1 / n). Pairs naturally with 'he' weight initialisation.Recommendations by Activation
| Activation | Recommended weights | Recommended biases | Notes |
|---|---|---|---|
'relu' | 'he' | 'zeros' or 'constant' | 'constant' (0.1) can prevent dead neurons at start |
'sigmoid' | 'xavier' | 'zeros' | Keeps activations near the linear region initially |
'tanh' | 'xavier' or 'lecun' | 'zeros' | Both are appropriate; 'lecun' is the original recommendation |
'none' (linear) | 'normal_random' or 'xavier' | 'zeros' | Variance control matters less for linear activations |
'softmax' | 'xavier' | 'zeros' | Softmax output layer; keep weights small |