Activation functions introduce the nonlinearity that lets a neural network learn complex mappings. Neural Network Framework ships five built-in activation functions, each identified by a short string key that you pass to a layer constructor. Every hidden layer and the output layer accept an activation string;Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adi3120/Neural-Network-Framework/llms.txt
Use this file to discover all available pages before exploring further.
InputLayer also accepts one if you want to transform raw features before they enter the network. This page documents each function, its derivative as used during backpropagation, and guidance on where to use it.
Quick-Reference Table
| Activation | Key | Typical use |
|---|---|---|
| Sigmoid | 'sigmoid' | Binary output neuron; shallow hidden layers |
| ReLU | 'relu' | General-purpose hidden layers |
| Tanh | 'tanh' | Hidden layers; zero-centred alternative to Sigmoid |
| Softmax | 'softmax' | Multi-class output layer only |
| Linear (identity) | 'none' | Regression output; pass-through input layer |
Sigmoid
Sigmoid squashes any real number into the range(0, 1), making it a natural choice for output neurons that represent probabilities.
Constructor usage
[−500, 500] before exponentiation to prevent overflow:
Sigmoid suffers from the vanishing gradient problem in deep networks — the derivative is at most 0.25 and shrinks toward zero in the tails. Prefer ReLU for hidden layers in deep architectures.
ReLU
Rectified Linear Unit passes positive values unchanged and zeros out negatives. It is the most widely used hidden-layer activation because it rarely saturates and produces sparse activations. Constructor usageThe dying ReLU problem occurs when neurons receive only negative inputs and permanently output zero. Using He initialisation (
set_weights('he')) and a moderate learning rate helps avoid this.Tanh
Hyperbolic tangent maps inputs to(−1, 1) and is zero-centred, which can make gradient updates more symmetric than Sigmoid.
Constructor usage
Like Sigmoid, Tanh can produce vanishing gradients in very deep networks. Its zero-centred output can improve convergence speed compared to Sigmoid in practice.
Softmax
Softmax converts a vector of raw scores into a valid probability distribution — all outputs sum to 1.0. It is designed exclusively for the output layer of multi-class classification problems. Constructor usagemax(x) before exponentiation is a standard numerical stability trick:
n × n matrix:
Linear (Identity)
The'none' key selects the identity function, which passes the pre-activation value through unchanged. It is the default for all layer types.
Constructor usage
Choosing the Right Activation
Hidden Layers
ReLU is the default choice for most architectures — fast to compute and avoids saturation.
Use Tanh when you need zero-centred activations (e.g., RNNs or shallow networks).
Use Sigmoid rarely in hidden layers; mainly useful for historical compatibility.
Use Tanh when you need zero-centred activations (e.g., RNNs or shallow networks).
Use Sigmoid rarely in hidden layers; mainly useful for historical compatibility.
Output Layer
Softmax for multi-class classification (pair with
Sigmoid for binary classification (pair with
Linear (
'crossentropy').Sigmoid for binary classification (pair with
'bincrossentropy').Linear (
'none') for regression tasks (pair with 'MSE').