BYOL (Bootstrap Your Own Latent) is the second primary SSL objective in SSRL-ECG. Unlike SimCLR, BYOL requires no negative samples — instead it uses an online network that predicts the representations of a momentum-updated target network to avoid representational collapse. BYOL classes are defined inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt
Use this file to discover all available pages before exploring further.
ssrl_ecg/train_ssl_byol.py and are composed with the ECGEncoder1DCNN backbone from ssrl_ecg.models.cnn.
BYOLProjector
BYOLProjector is the projection MLP that maps pooled encoder representations to a latent space shared by the online and target networks. It adds a BatchNorm1d between the two linear layers, which stabilises BYOL training by normalising the intermediate activations.
Constructor Parameters
Dimension of the encoder output after global average pooling. For
ECGEncoder1DCNN
with default width=64 this is 256.Width of the hidden projection layer.
Dimension of the projected output vector.
Forward
Pooled encoder features of shape
[batch, in_features].Projected representation of shape
[batch, out_dim].BYOLPredictor
BYOLPredictor sits on top of the online projector and predicts the (stop-gradient) target projections. Its architecture is identical to BYOLProjector. This asymmetry — the target network has no predictor — is the key mechanism that prevents representational collapse without requiring negatives.
Constructor Parameters
Dimension of the online projector output (
projection_dim, default 256).Width of the hidden predictor layer.
Output dimension. Must match
BYOLProjector.out_dim to allow the L2 regression loss.Forward
Online projected features of shape
[batch, in_features].Predicted target projection of shape
[batch, out_dim].BYOLModel
BYOLModel is the full BYOL system. It maintains four networks:
| Network | Gradient | Role |
|---|---|---|
encoder | ✅ trained | Online backbone |
online_projector | ✅ trained | Online projector |
online_predictor | ✅ trained | Online predictor |
target_encoder | ❌ EMA only | Momentum backbone |
target_projector | ❌ EMA only | Momentum projector |
Constructor Parameters
An
ECGEncoder1DCNN instance used as the online backbone. A deep copy is made
automatically for the target encoder, initialised with identical weights.Output dimension of both
BYOLProjector and BYOLPredictor.Hidden dimension used in both the projector and predictor MLPs.
Forward
torch.no_grad()) through the target network to produce stop-gradient targets.
First augmented ECG view, shape
[batch, channels, time].Second augmented ECG view, shape
[batch, channels, time].Online predictor output for view 1, shape
[batch, projection_dim].Target projector output for view 1 (stop-gradient), shape
[batch, projection_dim].Online predictor output for view 2, shape
[batch, projection_dim].Target projector output for view 2 (stop-gradient), shape
[batch, projection_dim].Momentum Encoder Update
After each optimiser step the target networks must be refreshed using EMA:tau controls how slowly the target network tracks the online network. Values close to 1.0 produce a very stable target and are critical for BYOL’s collapse-free behaviour.
| Tau | Effect |
|---|---|
0.996 | Faster adaptation; useful for short training runs |
0.999 | Default; good balance for 20–100 epoch training |
0.9999 | Very slow adaptation; suitable for large-scale pretraining |
BYOL Loss Function
The BYOL objective is a symmetric L2 regression loss between predictions and stop-gradient targets:pred and target are L2-normalised before the dot product, which makes the loss equivalent to the squared cosine distance: loss = 2 - 2 * cosine_similarity. The symmetric form uses each view as both the predictor source and the prediction target.
Full Training Loop
BYOL vs SimCLR
| Property | BYOL | SimCLR |
|---|---|---|
| Negative samples | Not required | Required (large batch) |
| Minimum batch size | ~32 | ~256 |
| Target network | EMA momentum copy | None |
| Loss | L2 regression | NT-Xent cross-entropy |
| Collapse prevention | Predictor asymmetry + EMA | Hard negatives |
| Typical pretraining epochs | 20–200 | 200–1000 |