Get Started with SSRL-ECG: End-to-End Pipeline Guide

This guide walks you through the complete SSRL-ECG workflow from a freshly installed package to a fully evaluated model. You will pretrain an ECG encoder with SimCLR using domain-adaptive augmentations, fine-tune a classification head on 10% of the PTB-XL labeled data, and evaluate performance on the held-out test fold. The entire pipeline reproduces the reported AUROC 0.8717 / F1 0.6448 result.

Install the package

If you have not already installed SSRL-ECG, set up a virtual environment and run the editable install from the repository root.

python -m venv .venv
source .venv/bin/activate   # Windows: .\.venv\Scripts\Activate.ps1
pip install -e .

Confirm the install succeeded:

python -c "import ssrl_ecg; print('OK')"

See the Installation guide for full details, including dataset folder setup and GPU verification.

Prepare the PTB-XL dataset

Download PTB-XL from PhysioNet and extract it so the layout matches the expected structure. All training scripts default to data/PTB-XL as the data root.

data/
└── PTB-XL/
    ├── ptbxl_database.csv
    ├── scp_statements.csv
    ├── records100/
    └── records500/

Verify the dataset loads correctly by printing summary statistics:

python -m ssrl_ecg.analyze_datasets --ptbxl-root data/PTB-XL

You should see the five-class distribution — NORM (9,514), MI (5,469), STTC (5,235), CD (4,898), HYP (2,649) — confirming that ptbxl_database.csv and the WFDB record files are in place.

PTB-XL is split into 10 folds. SSRL-ECG uses folds 1–8 for training (17,489 samples), fold 9 for validation (2,154 samples), and fold 10 for final testing (2,194 samples). This split is applied automatically by make_default_splits() — no manual configuration needed.

Pretrain the encoder with SimCLR

Run SimCLR pretraining on the full unlabeled training set. The encoder learns cardiovascular representations from pairs of augmented views without any labels.

python -m ssrl_ecg.train_ssl_simclr \
  --data-root data/PTB-XL \
  --epochs 20 \
  --batch-size 128 \
  --temperature 0.07 \
  --seed 42 \
  --out checkpoints/ssl_simclr_enhanced.pt

The script prints a per-epoch progress bar with the NT-Xent loss. On completion it saves only the encoder weights (not the projection head) to checkpoints/ssl_simclr_enhanced.pt.

[SIMCLR TRAINING]
  Encoder: ECGEncoder1DCNN(width=64)
  Projection dim: 128
  Temperature: 0.07
  Batch size: 128

SimCLR Epoch 1/20: 100%|██████████| loss=3.2541
SimCLR Epoch 2/20: 100%|██████████| loss=2.9873
...
SimCLR Epoch 20/20: 100%|██████████| loss=1.8204

Saved SimCLR encoder checkpoint to: checkpoints/ssl_simclr_enhanced.pt

SimCLR is the recommended framework. It achieves 0.8717 AUROC — higher than BYOL (0.8565) and the supervised baseline (0.8606). If you prefer to experiment with BYOL, replace the command with

python -m ssrl_ecg.train_ssl_byol --data-root data/PTB-XL --epochs 30 --batch-size 256 --seed 42 --out checkpoints/ssl_byol_enhanced.pt

If you encounter a CUDA out of memory error, reduce the batch size: --batch-size 64. The default of 128 is tuned for a GPU with ≥8 GB VRAM.

Fine-tune on 10% labeled data

Load the pretrained encoder and attach a linear classification head. Only 10% of the PTB-XL training labels (≈1,747 samples) are used, simulating a real-world low-annotation scenario.

python -m ssrl_ecg.train_finetune \
  --data-root data/PTB-XL \
  --ssl-checkpoint checkpoints/ssl_simclr_enhanced.pt \
  --epochs 20 \
  --batch-size 64 \
  --label-fraction 0.1 \
  --seed 42 \
  --out checkpoints/finetuned.pt

The fine-tuner reports per-epoch validation metrics and saves the checkpoint with the best macro-F1.

[LOSS CONFIGURATION]
  Loss function: focal
  Balancing strategy: oversample

FineTune Epoch 1/20: 100%|██████████| loss=0.4821
{'epoch': 1, 'auroc': 0.7934, 'f1_macro': 0.4211, ...}
...
FineTune Epoch 20/20: 100%|██████████| loss=0.1903
{'epoch': 20, 'auroc': 0.8717, 'f1_macro': 0.6448, ...}

Saved best fine-tuned checkpoint to: checkpoints/finetuned.pt

By default, fine-tuning uses Focal Loss (--loss focal, α=0.25, γ=2.0) combined with oversampling (--balance-strategy oversample) to handle the 3.32× class imbalance in PTB-XL. You can also pass --freeze-encoder to freeze the pretrained encoder and train only the classification head (linear probing).

Evaluate on the held-out test set

Run the evaluation script against PTB-XL fold 10 (the held-out test set that was never seen during training or validation).

python -m ssrl_ecg.evaluate \
  --data-root data/PTB-XL \
  --checkpoint checkpoints/finetuned.pt

Expected output metrics for the SimCLR fine-tuned model:

{
  'auroc':       0.8717,
  'f1_macro':    0.6448,
  'sensitivity': 0.6831,
  'specificity': 0.9411
}

These figures match the results reported in the README multi-seed validation: AUROC 0.8717 ± 0.0032 across 10 random seeds.

The evaluate script also supports robustness testing. Pass --noise-std 0.05 to add Gaussian noise or --mask-ratio 0.1 to mask 10% of the signal, simulating electrode artifacts and dropped leads.

What’s Next

Now that you have a working end-to-end pipeline, explore the individual components in depth.

SSL Pretraining

Deep dive into SimCLR and BYOL training options, augmentation configuration, and checkpoint formats.

Fine-Tuning

Configure label fractions, loss functions, balancing strategies, and encoder freezing for your dataset.

Evaluation

Run per-class metric breakdowns, multi-seed validation, and robustness tests under noise and masking.

Get Started

Concepts

Training

Evaluation & Analysis

Guides

Get Started with SSRL-ECG: End-to-End Pipeline Guide

What’s Next

SSL Pretraining

Fine-Tuning

Evaluation

Build docs developers (and LLMs) love

Get Started

Concepts

Training

Evaluation & Analysis

Guides

Documentation Index

​What’s Next

SSL Pretraining

Fine-Tuning

Evaluation

Build docs developers (and LLMs) love

What’s Next