Annotating ECG recordings for cardiovascular disease requires a cardiologist to review each 10-second trace and assign one or more of the five diagnostic superclasses. At scale this is expensive, time-consuming, and subject to inter-annotator variability. The label scarcity benchmark in SSRL-ECG quantifies exactly how much SSL pretraining helps when only a small fraction of those annotations is available at fine-tuning time.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt
Use this file to discover all available pages before exploring further.
The Label Scarcity Problem in Clinical ECG Annotation
The PTB-XL training set contains 17,489 labelled ECG recordings. In practice, a new clinical deployment might start with far fewer confirmed labels — perhaps a few hundred from a local institution. The critical question is whether an encoder pretrained on the unlabelled signal structure (via SimCLR or BYOL) can compensate for the lack of ground-truth annotations during supervised fine-tuning. The benchmark sweeps label fractions from 1% to 100%, corresponding to as few as ~175 samples up to the full 17,489. At 10% (1,747 samples), SimCLR fine-tuning already achieves AUROC 0.8717 — matching the supervised baseline trained on the complete labelled set.10% of the 17,489-sample training set equals 1,747 labelled samples, spread across five cardiovascular classes with natural class imbalance (NORM 3.32× more frequent than HYP).
LabelScarcityBenchmark Class
LabelScarcityBenchmark orchestrates three parallel training tracks at each label fraction:
- Supervised baseline — an
ImprovedECGClassifiertrained from scratch using only the available labels, with no pretraining. - SSL fine-tuned — the pretrained SSL encoder with all weights unfrozen during fine-tuning on the labelled subset.
- SSL frozen — the pretrained SSL encoder with the encoder weights frozen; only the classification head is trained (linear probing).
label_scarcity_results/label_scarcity_benchmark.json.
Constructor Parameters
Root directory of the PTB-XL dataset. Passed to
load_ptbxl_metadata and PTBXLRecordDataset.Directory containing pretrained SSL checkpoints. The benchmark looks for
ssl_masked.pt by default. If the checkpoint is absent, only the supervised track runs.Output directory for the JSON results file. Created automatically if it does not exist.
run_label_scarcity_benchmark() Parameters
List of label fractions to sweep. Each value is the proportion of the training set used, e.g.
0.1 = 10% = 1,747 samples.Random seeds for reproducible sampling of labelled indices. Mean and standard deviation are reported across seeds.
Maximum training epochs per run. Early stopping with patience of 10 epochs is applied automatically.
Running the Benchmark
The benchmark is integrated into therun_experiments.py script at the project root, which runs all phases in sequence:
Run the label scarcity benchmark
Sweep label fractions and compare SSL vs supervised across three seeds.This uses the defaults:
data_root=data/PTB-XL, checkpoint_dir=checkpoints, results_dir=label_scarcity_results, fractions [0.05, 0.1, 0.25, 1.0], seeds [42, 52, 62], epochs 30.Why SSL Gains Are Largest at Low Label Fractions
At high label fractions (e.g. 100%), a supervised model trained from scratch has enough data to learn a good representation on its own. The SSL advantage narrows but remains positive. At very low label fractions (1–5%), a supervised model struggles to capture the temporal structure of ECG waveforms from a few hundred examples. The SSL encoder, pretrained on tens of thousands of unlabelled signals, already encodes heartbeat morphology, frequency bands, and inter-channel correlations — the fine-tuning stage only needs to attach a linear head on top. Thefinetune_ssl method supports an optional freeze_encoder=True flag for linear probing, which is faster but slightly weaker than full fine-tuning at low label fractions:
Data Split Details
Training pool
17,489 samples (PTB-XL folds 1–8). Label fraction is applied here via stratified sampling with
sample_labelled_indices.Validation set
2,154 samples (PTB-XL fold 9). Used for early stopping and learning rate scheduling during all training runs.
Test set
2,194 samples (PTB-XL fold 10). Held out completely; used only for final metric computation via
evaluate_model.Label fraction reference
1% ≈ 175 samples · 5% ≈ 875 samples · 10% ≈ 1,747 samples · 25% ≈ 4,372 samples · 100% = 17,489 samples