Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through the complete SSRL-ECG workflow from a freshly installed package to a fully evaluated model. You will pretrain an ECG encoder with SimCLR using domain-adaptive augmentations, fine-tune a classification head on 10% of the PTB-XL labeled data, and evaluate performance on the held-out test fold. The entire pipeline reproduces the reported AUROC 0.8717 / F1 0.6448 result.
1

Install the package

If you have not already installed SSRL-ECG, set up a virtual environment and run the editable install from the repository root.
python -m venv .venv
source .venv/bin/activate   # Windows: .\.venv\Scripts\Activate.ps1
pip install -e .
Confirm the install succeeded:
python -c "import ssrl_ecg; print('OK')"
See the Installation guide for full details, including dataset folder setup and GPU verification.
2

Prepare the PTB-XL dataset

Download PTB-XL from PhysioNet and extract it so the layout matches the expected structure. All training scripts default to data/PTB-XL as the data root.
data/
└── PTB-XL/
    ├── ptbxl_database.csv
    ├── scp_statements.csv
    ├── records100/
    └── records500/
Verify the dataset loads correctly by printing summary statistics:
python -m ssrl_ecg.analyze_datasets --ptbxl-root data/PTB-XL
You should see the five-class distribution — NORM (9,514), MI (5,469), STTC (5,235), CD (4,898), HYP (2,649) — confirming that ptbxl_database.csv and the WFDB record files are in place.
PTB-XL is split into 10 folds. SSRL-ECG uses folds 1–8 for training (17,489 samples), fold 9 for validation (2,154 samples), and fold 10 for final testing (2,194 samples). This split is applied automatically by make_default_splits() — no manual configuration needed.
3

Pretrain the encoder with SimCLR

Run SimCLR pretraining on the full unlabeled training set. The encoder learns cardiovascular representations from pairs of augmented views without any labels.
python -m ssrl_ecg.train_ssl_simclr \
  --data-root data/PTB-XL \
  --epochs 20 \
  --batch-size 128 \
  --temperature 0.07 \
  --seed 42 \
  --out checkpoints/ssl_simclr_enhanced.pt
The script prints a per-epoch progress bar with the NT-Xent loss. On completion it saves only the encoder weights (not the projection head) to checkpoints/ssl_simclr_enhanced.pt.
[SIMCLR TRAINING]
  Encoder: ECGEncoder1DCNN(width=64)
  Projection dim: 128
  Temperature: 0.07
  Batch size: 128

SimCLR Epoch 1/20: 100%|██████████| loss=3.2541
SimCLR Epoch 2/20: 100%|██████████| loss=2.9873
...
SimCLR Epoch 20/20: 100%|██████████| loss=1.8204

Saved SimCLR encoder checkpoint to: checkpoints/ssl_simclr_enhanced.pt
SimCLR is the recommended framework. It achieves 0.8717 AUROC — higher than BYOL (0.8565) and the supervised baseline (0.8606). If you prefer to experiment with BYOL, replace the command with python -m ssrl_ecg.train_ssl_byol --data-root data/PTB-XL --epochs 30 --batch-size 256 --seed 42 --out checkpoints/ssl_byol_enhanced.pt.
If you encounter a CUDA out of memory error, reduce the batch size: --batch-size 64. The default of 128 is tuned for a GPU with ≥8 GB VRAM.
4

Fine-tune on 10% labeled data

Load the pretrained encoder and attach a linear classification head. Only 10% of the PTB-XL training labels (≈1,747 samples) are used, simulating a real-world low-annotation scenario.
python -m ssrl_ecg.train_finetune \
  --data-root data/PTB-XL \
  --ssl-checkpoint checkpoints/ssl_simclr_enhanced.pt \
  --epochs 20 \
  --batch-size 64 \
  --label-fraction 0.1 \
  --seed 42 \
  --out checkpoints/finetuned.pt
The fine-tuner reports per-epoch validation metrics and saves the checkpoint with the best macro-F1.
[LOSS CONFIGURATION]
  Loss function: focal
  Balancing strategy: oversample

FineTune Epoch 1/20: 100%|██████████| loss=0.4821
{'epoch': 1, 'auroc': 0.7934, 'f1_macro': 0.4211, ...}
...
FineTune Epoch 20/20: 100%|██████████| loss=0.1903
{'epoch': 20, 'auroc': 0.8717, 'f1_macro': 0.6448, ...}

Saved best fine-tuned checkpoint to: checkpoints/finetuned.pt
By default, fine-tuning uses Focal Loss (--loss focal, α=0.25, γ=2.0) combined with oversampling (--balance-strategy oversample) to handle the 3.32× class imbalance in PTB-XL. You can also pass --freeze-encoder to freeze the pretrained encoder and train only the classification head (linear probing).
5

Evaluate on the held-out test set

Run the evaluation script against PTB-XL fold 10 (the held-out test set that was never seen during training or validation).
python -m ssrl_ecg.evaluate \
  --data-root data/PTB-XL \
  --checkpoint checkpoints/finetuned.pt
Expected output metrics for the SimCLR fine-tuned model:
{
  'auroc':       0.8717,
  'f1_macro':    0.6448,
  'sensitivity': 0.6831,
  'specificity': 0.9411
}
These figures match the results reported in the README multi-seed validation: AUROC 0.8717 ± 0.0032 across 10 random seeds.
The evaluate script also supports robustness testing. Pass --noise-std 0.05 to add Gaussian noise or --mask-ratio 0.1 to mask 10% of the signal, simulating electrode artifacts and dropped leads.

What’s Next

Now that you have a working end-to-end pipeline, explore the individual components in depth.

SSL Pretraining

Deep dive into SimCLR and BYOL training options, augmentation configuration, and checkpoint formats.

Fine-Tuning

Configure label fractions, loss functions, balancing strategies, and encoder freezing for your dataset.

Evaluation

Run per-class metric breakdowns, multi-seed validation, and robustness tests under noise and masking.

Build docs developers (and LLMs) love