Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt

Use this file to discover all available pages before exploring further.

Annotating ECG recordings for cardiovascular disease requires a cardiologist to review each 10-second trace and assign one or more of the five diagnostic superclasses. At scale this is expensive, time-consuming, and subject to inter-annotator variability. The label scarcity benchmark in SSRL-ECG quantifies exactly how much SSL pretraining helps when only a small fraction of those annotations is available at fine-tuning time.

The Label Scarcity Problem in Clinical ECG Annotation

The PTB-XL training set contains 17,489 labelled ECG recordings. In practice, a new clinical deployment might start with far fewer confirmed labels — perhaps a few hundred from a local institution. The critical question is whether an encoder pretrained on the unlabelled signal structure (via SimCLR or BYOL) can compensate for the lack of ground-truth annotations during supervised fine-tuning. The benchmark sweeps label fractions from 1% to 100%, corresponding to as few as ~175 samples up to the full 17,489. At 10% (1,747 samples), SimCLR fine-tuning already achieves AUROC 0.8717 — matching the supervised baseline trained on the complete labelled set.
10% of the 17,489-sample training set equals 1,747 labelled samples, spread across five cardiovascular classes with natural class imbalance (NORM 3.32× more frequent than HYP).

LabelScarcityBenchmark Class

LabelScarcityBenchmark orchestrates three parallel training tracks at each label fraction:
  • Supervised baseline — an ImprovedECGClassifier trained from scratch using only the available labels, with no pretraining.
  • SSL fine-tuned — the pretrained SSL encoder with all weights unfrozen during fine-tuning on the labelled subset.
  • SSL frozen — the pretrained SSL encoder with the encoder weights frozen; only the classification head is trained (linear probing).
Results for each track and seed are written to label_scarcity_results/label_scarcity_benchmark.json.
from pathlib import Path
from ssrl_ecg.label_scarcity_benchmark import LabelScarcityBenchmark

benchmark = LabelScarcityBenchmark(
    data_root=Path("data/PTB-XL"),
    checkpoint_dir=Path("checkpoints"),
    results_dir=Path("label_scarcity_results"),
)

results = benchmark.run_label_scarcity_benchmark(
    label_fractions=[0.01, 0.05, 0.1, 0.25, 1.0],
    seeds=[42, 52, 62],
    epochs=40,
)

Constructor Parameters

data_root
Path
required
Root directory of the PTB-XL dataset. Passed to load_ptbxl_metadata and PTBXLRecordDataset.
checkpoint_dir
Path
required
Directory containing pretrained SSL checkpoints. The benchmark looks for ssl_masked.pt by default. If the checkpoint is absent, only the supervised track runs.
results_dir
Path
required
Output directory for the JSON results file. Created automatically if it does not exist.

run_label_scarcity_benchmark() Parameters

label_fractions
list[float]
default:"[0.01, 0.05, 0.1, 0.25, 1.0]"
List of label fractions to sweep. Each value is the proportion of the training set used, e.g. 0.1 = 10% = 1,747 samples.
seeds
list[int]
default:"[42, 52, 62]"
Random seeds for reproducible sampling of labelled indices. Mean and standard deviation are reported across seeds.
epochs
int
default:"40"
Maximum training epochs per run. Early stopping with patience of 10 epochs is applied automatically.

Running the Benchmark

The benchmark is integrated into the run_experiments.py script at the project root, which runs all phases in sequence:
1

Train the SSL encoder

Pretrain a SimCLR encoder on the full PTB-XL training set (unlabelled).
python -m ssrl_ecg.train_ssl_simclr \
  --data-root data/PTB-XL \
  --epochs 20 \
  --batch-size 128 \
  --out checkpoints/ssl_masked.pt
2

Run the label scarcity benchmark

Sweep label fractions and compare SSL vs supervised across three seeds.
python -m ssrl_ecg.label_scarcity_benchmark
This uses the defaults: data_root=data/PTB-XL, checkpoint_dir=checkpoints, results_dir=label_scarcity_results, fractions [0.05, 0.1, 0.25, 1.0], seeds [42, 52, 62], epochs 30.
3

Review results

Results are printed to stdout as a summary table and saved to label_scarcity_results/label_scarcity_benchmark.json.
[1.0% Labeled Data]
  supervised           AUROC: 0.7234±0.0187 | F1: 0.3812±0.0241
  ssl_finetuned        AUROC: 0.8051±0.0143 | F1: 0.5234±0.0198
  ssl_frozen           AUROC: 0.7889±0.0156 | F1: 0.4967±0.0211

[10.0% Labeled Data]
  supervised           AUROC: 0.8606±0.0034 | F1: 0.5750±0.0121
  ssl_finetuned        AUROC: 0.8717±0.0032 | F1: 0.6448±0.0181
  ssl_frozen           AUROC: 0.8512±0.0041 | F1: 0.5981±0.0163

Why SSL Gains Are Largest at Low Label Fractions

At high label fractions (e.g. 100%), a supervised model trained from scratch has enough data to learn a good representation on its own. The SSL advantage narrows but remains positive. At very low label fractions (1–5%), a supervised model struggles to capture the temporal structure of ECG waveforms from a few hundred examples. The SSL encoder, pretrained on tens of thousands of unlabelled signals, already encodes heartbeat morphology, frequency bands, and inter-channel correlations — the fine-tuning stage only needs to attach a linear head on top.
At 1–5% label fractions, expect SSL fine-tuned to outperform supervised by 5–8 AUROC points. The gap shrinks progressively as more labels are added, converging near the 100% mark.
The finetune_ssl method supports an optional freeze_encoder=True flag for linear probing, which is faster but slightly weaker than full fine-tuning at low label fractions:
# Full fine-tuning: all encoder weights updated
model = benchmark.finetune_ssl(ssl_encoder, label_fraction=0.05, freeze_encoder=False)

# Linear probing: only the classification head is trained
model = benchmark.finetune_ssl(ssl_encoder, label_fraction=0.05, freeze_encoder=True)

Data Split Details

Training pool

17,489 samples (PTB-XL folds 1–8). Label fraction is applied here via stratified sampling with sample_labelled_indices.

Validation set

2,154 samples (PTB-XL fold 9). Used for early stopping and learning rate scheduling during all training runs.

Test set

2,194 samples (PTB-XL fold 10). Held out completely; used only for final metric computation via evaluate_model.

Label fraction reference

1% ≈ 175 samples · 5% ≈ 875 samples · 10% ≈ 1,747 samples · 25% ≈ 4,372 samples · 100% = 17,489 samples
The benchmark can be time-consuming at high epoch counts and many seeds. For a quick exploratory run, use label_fractions=[0.05, 0.1, 1.0], seeds=[42], and epochs=15 to get indicative results in under an hour on a modern GPU.

Build docs developers (and LLMs) love