The bias module adds learned bias terms to matrix multiplication results from the systolic array. It operates as part of the VPU’s forward pass pathway, implementing the bias addition step in neural network layer computation.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tiny-tpu-v2/tiny-tpu/llms.txt
Use this file to discover all available pages before exploring further.
Architecture
The bias module consists of a parent-child hierarchy:- bias_parent: Top-level module instantiating two bias_child modules for parallel column processing
- bias_child: Individual processing unit handling bias addition for one feature column
Module ports
bias_parent
System clock signal
Active-high reset signal
Bias scalar for column 1, fetched from unified buffer
Bias scalar for column 2, fetched from unified buffer
Data input from systolic array for column 1
Data input from systolic array for column 2
Valid signal for column 1 data from systolic array
Valid signal for column 2 data from systolic array
Pre-activation output (Z) for column 1
Pre-activation output (Z) for column 2
Valid signal for column 1 output
Valid signal for column 2 output
bias_child
System clock signal
Active-high reset signal
Bias scalar value from unified buffer
Data from systolic array
Valid signal from systolic array
Pre-activation output after bias addition
Output valid signal
Operation
The bias module performs fixed-point addition: Z = X·W + b Where:- X·W is the matrix multiplication result from the systolic array
- b is the bias term stored in the unified buffer
- Z is the pre-activation output
Pipeline stages
- Combinational addition: The
fxp_addmodule performs fixed-point addition of systolic array output and bias scalar - Registered output: On the next clock cycle, if the input valid signal is high, the result is registered and the output valid signal is asserted
Fixed-point arithmetic
The bias module uses 16-bit signed fixed-point representation (Q8.8 format: 8 integer bits, 8 fractional bits). Thefxp_add module handles:
- Proper alignment of binary points
- Overflow detection
- Rounding according to configured parameters
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/fixedpoint.sv:110 for the fxp_add implementation.
Integration with VPU
The bias module is activated during the VPU’s forward pass pathway. The VPU data pathway control bits determine routing:- Pathway 1100 (forward pass):
systolic → bias → leaky_relu → output - Pathway 1111 (transition):
systolic → bias → leaky_relu → loss → leaky_relu_derivative → output
vpu_data_pathway[3] is set to 1, the VPU routes:
- Systolic array outputs to bias module inputs
- Bias scalars from unified buffer to bias module
- Bias module outputs to the next stage (leaky ReLU)
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:213-224 for the bias routing logic.
Data flow
Implementation details
- Latency: 1 clock cycle (registered output)
- Throughput: 2 values per cycle (dual column processing)
- Bias update frequency: Bias values remain constant for an entire layer and are updated only between layers
- Reset behavior: On reset, output data and valid signals are cleared to zero
Source files
- Parent module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/bias_parent.sv - Child module:
https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/bias_child.sv