Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt

Use this file to discover all available pages before exploring further.

SSRL-ECG is a self-supervised representation learning framework for ECG-based cardiovascular disease classification in low-data regimes. It combines domain-adaptive augmentations engineered specifically for cardiac signals with SimCLR and BYOL pretraining frameworks, enabling accurate multi-label classification with as few as 1,747 labeled samples from the PTB-XL dataset.

Installation

Set up your Python environment and install the ssrl-ecg package with all dependencies.

Quickstart

Run SSL pretraining and fine-tuning end-to-end in under 30 minutes.

SSL Concepts

Understand SimCLR, BYOL, and why self-supervised learning excels with limited labels.

API Reference

Explore the full Python API — models, augmentations, losses, and utilities.

What is SSRL-ECG?

Standard supervised learning for ECG diagnosis requires large amounts of labeled data — a bottleneck in clinical settings where expert annotation is expensive and time-consuming. SSRL-ECG addresses this by pretraining a 1D CNN encoder on unlabeled ECGs using contrastive objectives, then fine-tuning on just 10% of labels. The framework achieves:
MethodAUROCF1SensitivitySpecificity
Supervised (Focal + Oversample)0.86060.57500.67720.9357
SimCLR + Augmentations0.87170.64480.68310.9411
BYOL + Augmentations0.85650.63010.66480.9278

Key Features

Domain-Adaptive Augmentations

7 ECG-specific augmentations: frequency warping, medical mixup, bandpass filtering, CutMix, motion artifacts, per-channel noise, and temporal dropout.

SimCLR Pretraining

Contrastive learning with NT-Xent loss on unlabeled ECGs. Achieves 0.8717 AUROC after fine-tuning on 10% labels.

BYOL Pretraining

Momentum-based self-supervised learning as an alternative to contrastive approaches.

Label Scarcity Benchmark

Systematic comparison of SSL vs. supervised learning across label fractions from 1% to 100%.

Multi-Seed Validation

Statistical robustness via 10 random seeds with 95% confidence intervals.

Transfer to MIT-BIH

External validation on the MIT-BIH arrhythmia dataset for out-of-distribution robustness.

Getting Started

1

Install the package

Create a virtual environment and install ssrl-ecg in editable mode:
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\Activate.ps1
pip install -e .
2

Prepare PTB-XL data

Download the PTB-XL dataset and place it under data/PTB-XL/ following the expected folder structure.
3

Pretrain with SimCLR

Run self-supervised pretraining on unlabeled ECGs:
python -m ssrl_ecg.train_ssl_simclr \
  --data-root data/PTB-XL \
  --epochs 20 \
  --batch-size 128 \
  --out checkpoints/ssl_simclr.pt
4

Fine-tune on labeled data

Fine-tune the pretrained encoder with 10% labeled samples:
python -m ssrl_ecg.train_finetune \
  --data-root data/PTB-XL \
  --ssl-checkpoint checkpoints/ssl_simclr.pt \
  --label-fraction 0.1 \
  --out checkpoints/finetuned.pt

Project Structure

The ssrl_ecg package lives under src/ssrl_ecg/ and provides both importable Python modules and CLI entry points invokable via python -m ssrl_ecg.<module>.

Training Scripts

train_ssl_simclr, train_ssl_byol, train_supervised, train_finetune

Models

ECGEncoder1DCNN, SimCLRModel, BYOLModel, ResNet1D, ECGClassifier

Augmentations

ECGAugmentations, ContrastiveAugmentationPipeline

Evaluation

evaluate, label_scarcity_benchmark, analyze_datasets

Build docs developers (and LLMs) love