The Vector Processing Unit (VPU) contains four pipelined processing modules that can be selectively activated using the 4-bitDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/tiny-tpu-v2/tiny-tpu/llms.txt
Use this file to discover all available pages before exploring further.
vpu_data_pathway field.
VPU pipeline modules
The VPU consists of four sequential modules:- Bias addition - Adds bias vectors to systolic array outputs
- Leaky ReLU - Applies activation function with configurable leak factor
- MSE loss - Computes mean squared error against target values
- Leaky ReLU derivative - Computes gradient of activation function
Pathway configurations
The 4-bitvpu_data_pathway field controls which modules are active:
Forward pass - Layer 1
- Systolic array output (Z1) enters VPU
- Bias module adds B1 vector
- Leaky ReLU applies activation
- Result (H1) exits VPU
Forward pass - Output layer with loss
- Systolic array output (Z2) enters VPU
- Bias module adds B2 vector
- Leaky ReLU applies activation (H2)
- MSE loss computes error against target Y
- Result (dL/dZ2) exits VPU
This pathway is described in comments as the “transition pathway from forward pass to backward pass” because it both completes the forward computation and produces the first gradient.
Backward pass - Activation derivative
- Upstream gradient (dL/dZ_next) enters VPU
- Leaky ReLU derivative module multiplies by activation gradient
- Result (dL/dZ) exits VPU
Gradient computation - Bypass mode
- Systolic array output passes directly through VPU
- No processing applied
- Raw systolic output exits VPU
Pointer routing coordination
The VPU pathway configuration must be coordinated withub_ptr_sel to route the correct data to each module:
| Pathway | Module needing data | ub_ptr_sel | Data source |
|---|---|---|---|
0b1100 | Bias addition | 010 | Bias vector from UB |
0b1111 | Bias addition | 010 | Bias vector from UB |
0b1111 | MSE loss | 011 | Target values (Y) from UB |
0b0001 | Leaky ReLU derivative | 100 | Pre-activation values (H) from UB |
Example: Forward pass configuration
Fromtest_tpu.py:184-203, loading inputs and computing first layer:
Example: Backward pass configuration
Fromtest_tpu.py:322-349, computing gradients for layer 1:
Gradient descent data routing
During weight updates, the VPU uses additional pointer selections:vpu_data_pathway = 0b0000 (bypass mode) since gradient descent happens after the main VPU pipeline.