The Vector Processing Unit (VPU) is a configurable pipeline that applies activation functions, computes loss derivatives, and calculates activation derivatives. It supports three data pathways for forward pass, transition, and backward pass operations.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tiny-tpu-v2/tiny-tpu/llms.txt
Use this file to discover all available pages before exploring further.
Module declaration
Input ports
Systolic array inputs
| Port | Width | Description |
|---|---|---|
vpu_data_in_1 | signed [15:0] | Data input from systolic array column 1 |
vpu_data_in_2 | signed [15:0] | Data input from systolic array column 2 |
vpu_valid_in_1 | 1 | Valid signal for column 1 |
vpu_valid_in_2 | 1 | Valid signal for column 2 |
Unified buffer inputs
| Port | Width | Description |
|---|---|---|
bias_scalar_in_1 | signed [15:0] | Bias value for column 1 |
bias_scalar_in_2 | signed [15:0] | Bias value for column 2 |
lr_leak_factor_in | signed [15:0] | Leak factor α for leaky ReLU (Q8.8 format) |
Y_in_1 | signed [15:0] | Ground truth label for loss computation (column 1) |
Y_in_2 | signed [15:0] | Ground truth label for loss computation (column 2) |
inv_batch_size_times_two_in | signed [15:0] | Scaling factor: 1/(batch_size × 2) |
H_in_1 | signed [15:0] | Cached activation value for derivative (column 1) |
H_in_2 | signed [15:0] | Cached activation value for derivative (column 2) |
Control signal
| Port | Width | Description | |||
|---|---|---|---|---|---|
vpu_data_pathway | [3:0] | Module enable bits: `[bias | leaky_relu | loss | leaky_relu_derivative]` |
Output ports
| Port | Width | Description |
|---|---|---|
vpu_data_out_1 | signed [15:0] | Processed data output for column 1 |
vpu_data_out_2 | signed [15:0] | Processed data output for column 2 |
vpu_valid_out_1 | 1 | Valid signal for column 1 output |
vpu_valid_out_2 | 1 | Valid signal for column 2 output |
Architecture
Module pipeline
The VPU consists of four processing stages:- Bias (
bias_parent): Adds bias to input values - Leaky ReLU (
leaky_relu_parent): Applies leaky ReLU activation - Loss (
loss_parent): Computes loss derivative (∂L/∂H) - Leaky ReLU Derivative (
leaky_relu_derivative_parent): Computes activation derivative
Data pathways
Thevpu_data_pathway signal configures the active modules:
Bias module enable (1 = enabled)
Leaky ReLU module enable (1 = enabled)
Loss module enable (1 = enabled)
Leaky ReLU derivative module enable (1 = enabled)
Operation modes
Forward pass pathway (4'b1100)
H = LeakyReLU(Z) where Z = X + b
Use case: Hidden layer activations during forward propagation
Transition pathway (4'b1111)
∂L/∂Z = (H - Y) / (batch_size × 2) ⊙ LeakyReLU'(H)
Use case: Final layer computation that transitions from forward to backward pass. The H matrix is cached internally for use in the derivative calculation.
Backward pass pathway (4'b0001)
∂L/∂Z = ∂L/∂H ⊙ LeakyReLU'(H)
Use case: Hidden layer gradients during backpropagation. H values come from H_in_* ports.
Inactive mode (4'b0000)
All modules bypassed, no processing occurs.
Activation caching
The VPU includes an internal cache for H matrices (activation outputs): From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:333-348:Signal routing logic
The VPU uses combinational logic to route signals through the enabled modules: From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/vpu.sv:187-330:Timing behavior
- Bias: 1 clock cycle latency
- Leaky ReLU: 1 clock cycle latency
- Loss: 1 clock cycle latency
- Leaky ReLU Derivative: 1 clock cycle latency
- Forward pass: 2 cycles (bias + leaky ReLU)
- Transition: 4 cycles (all modules)
- Backward pass: 1 cycle (leaky ReLU derivative only)
Example instantiation
From ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/tpu.sv:157-184:Related modules
- TPU - Top-level integration
- Unified Buffer - Data source and destination
- Bias parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/bias_parent.sv)
- Leaky ReLU parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_parent.sv)
- Loss parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/loss_parent.sv)
- Leaky ReLU derivative parent (~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/src/leaky_relu_derivative_parent.sv)
Testing
See test files:- ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/test/dump_vpu.sv - Waveform dump configuration
- ~https://github.com/tiny-tpu-v2/tiny-tpu/blob/main/test/test_vpu.py - Python test suite