The leaky ReLU derivative module computes the derivative of the leaky ReLU activation function during backpropagation. This module multiplies upstream gradients by the local activation derivative, implementing the chain rule for gradient flow through the network.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tiny-tpu-v2/tiny-tpu/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
The module follows the standard parent-child hierarchy:- leaky_relu_derivative_parent: Top-level module instantiating two child modules
- leaky_relu_derivative_child: Processing unit computing derivative for one column
Module ports
leaky_relu_derivative_parent
System clock signal
Active-high reset signal
Leak factor (α) used in forward pass, shared across both columns
Valid signal for column 1 input
Valid signal for column 2 input
Upstream gradient for column 1
Upstream gradient for column 2
Cached forward pass activation (H) for column 1
Cached forward pass activation (H) for column 2
Computed gradient for column 1
Computed gradient for column 2
Valid signal for column 1 output
Valid signal for column 2 output
leaky_relu_derivative_child
System clock signal
Active-high reset signal
Input valid signal
Upstream gradient (∂L/∂H)
Leak factor (α)
Forward pass activation value (H) for determining derivative
Output gradient (∂L/∂Z)
Output valid signal
Derivative function
The derivative of leaky ReLU is:- ∂L/∂H is the upstream gradient (from the next layer)
- f’(Z) is the activation derivative
- ∂L/∂Z is the gradient to propagate to the previous layer
Operation
Algorithm
The derivative module determines the activation derivative based on the sign of the cached forward pass activation (H):- Check forward pass value: Examine sign of
lr_d_H_data_in - Conditional gradient computation:
- If
H >= 0: Derivative is 1, pass gradient through unchanged:output = input - If
H < 0: Derivative is α, scale gradient:output = input × α
- If
- Register output: On clock edge, output the computed gradient with valid signal
Pipeline stages
- Sign detection: Check if cached activation H is non-negative (combinational)
- Conditional computation:
- Non-negative path: Direct assignment (no operation)
- Negative path: Fixed-point multiply using
fxp_mul
- Registered output: Result and valid signal latched on clock edge
Why use H instead of Z?
The module uses the activated value H rather than the pre-activation Z to determine the derivative:- For standard leaky ReLU: sign(H) = sign(Z), so either works
- Using H is convenient because it’s already available from the forward pass
- H values are cached in the VPU during the transition pathway
- This avoids needing to cache additional pre-activation values
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_derivative_child.sv:31 for the implementation.
Fixed-point arithmetic
The module uses 16-bit signed fixed-point (Q8.8 format):-
Multiplication: When H < 0,
fxp_mulcomputesgradient × leak_factor- See
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/fixedpoint.sv:278 - Handles binary point alignment
- Detects overflow conditions
- See
- Pass-through: When H >= 0, gradient passes unchanged (derivative = 1)
Integration with VPU
The leaky ReLU derivative module is active during transition and backward pass pathways:- Pathway 1111 (transition):
systolic → bias → leaky_relu → loss → leaky_relu_derivative → output - Pathway 0001 (backward):
systolic → leaky_relu_derivative → output
vpu_data_pathway[0] is set to 1:
Transition pathway (1111)
- Loss module gradients route to derivative inputs
- Cached H values (from leaky ReLU forward pass) route to H inputs
- Leak factor provided from unified buffer
- Outputs route to final VPU output (back to unified buffer)
Backward pathway (0001)
- Systolic array outputs (upstream gradients) route to derivative inputs
- H values provided from unified buffer (pre-cached from forward pass)
- Leak factor provided from unified buffer
- Outputs route to final VPU output for further backpropagation
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:304-328 for the derivative routing logic.
Data flow
Transition phase
Backward phase
H value caching
The VPU includes special logic for caching H values:- During transition pathway (1111): H values from leaky ReLU are cached in internal registers
- Cache update: See
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:282-285 - Cache usage: Cached values route to derivative module during transition
- For subsequent backward passes: H values are loaded from unified buffer (pre-stored during forward pass)
Implementation details
- Latency: 1 clock cycle (registered output)
- Throughput: 2 gradients per cycle
- Sign check: Uses MSB of H value (sign bit)
- Multiplication: Only performed for negative activations
- Reset behavior: Outputs and valid signals cleared to zero
- Valid signal: Propagated from input to output with one cycle delay
Gradient flow example
Consider a batch element where:- Upstream gradient:
∂L/∂H = 0.5(0x0080 in Q8.8) - Cached activation:
H = -0.2(0xFF33 in Q8.8) - Leak factor:
α = 0.1(0x0019 in Q8.8)
- Check H: H < 0, so use scaled path
- Multiply:
0.5 × 0.1 = 0.05 - Output:
∂L/∂Z = 0.05(0x000C in Q8.8)
Source files
- Parent module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_derivative_parent.sv - Child module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_derivative_child.sv