This guide walks you through the complete SSRL-ECG workflow from a freshly installed package to a fully evaluated model. You will pretrain an ECG encoder with SimCLR using domain-adaptive augmentations, fine-tune a classification head on 10% of the PTB-XL labeled data, and evaluate performance on the held-out test fold. The entire pipeline reproduces the reported AUROC 0.8717 / F1 0.6448 result.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt
Use this file to discover all available pages before exploring further.
Install the package
If you have not already installed SSRL-ECG, set up a virtual environment and run the editable install from the repository root.Confirm the install succeeded:See the Installation guide for full details, including dataset folder setup and GPU verification.
Prepare the PTB-XL dataset
Download PTB-XL from PhysioNet and extract it so the layout matches the expected structure. All training scripts default to Verify the dataset loads correctly by printing summary statistics:You should see the five-class distribution — NORM (9,514), MI (5,469), STTC (5,235), CD (4,898), HYP (2,649) — confirming that
data/PTB-XL as the data root.ptbxl_database.csv and the WFDB record files are in place.PTB-XL is split into 10 folds. SSRL-ECG uses folds 1–8 for training (17,489 samples), fold 9 for validation (2,154 samples), and fold 10 for final testing (2,194 samples). This split is applied automatically by
make_default_splits() — no manual configuration needed.Pretrain the encoder with SimCLR
Run SimCLR pretraining on the full unlabeled training set. The encoder learns cardiovascular representations from pairs of augmented views without any labels.The script prints a per-epoch progress bar with the NT-Xent loss. On completion it saves only the encoder weights (not the projection head) to
checkpoints/ssl_simclr_enhanced.pt.Fine-tune on 10% labeled data
Load the pretrained encoder and attach a linear classification head. Only 10% of the PTB-XL training labels (≈1,747 samples) are used, simulating a real-world low-annotation scenario.The fine-tuner reports per-epoch validation metrics and saves the checkpoint with the best macro-F1.
By default, fine-tuning uses Focal Loss (
--loss focal, α=0.25, γ=2.0) combined with oversampling (--balance-strategy oversample) to handle the 3.32× class imbalance in PTB-XL. You can also pass --freeze-encoder to freeze the pretrained encoder and train only the classification head (linear probing).Evaluate on the held-out test set
Run the evaluation script against PTB-XL fold 10 (the held-out test set that was never seen during training or validation).Expected output metrics for the SimCLR fine-tuned model:These figures match the results reported in the README multi-seed validation: AUROC 0.8717 ± 0.0032 across 10 random seeds.
What’s Next
Now that you have a working end-to-end pipeline, explore the individual components in depth.SSL Pretraining
Deep dive into SimCLR and BYOL training options, augmentation configuration, and checkpoint formats.
Fine-Tuning
Configure label fractions, loss functions, balancing strategies, and encoder freezing for your dataset.
Evaluation
Run per-class metric breakdowns, multi-seed validation, and robustness tests under noise and masking.