Troubleshoot Common SSRL-ECG Installation and Runtime Issues

Most problems encountered when setting up or running SSRL-ECG fall into one of a handful of categories: an installation shortcut that skips editable-mode setup, a tensor-shape assumption that differs between a single sample and a full batch, or a GPU configuration that silently falls back to CPU. This page collects every known issue and its verified fix, drawn directly from the project’s source code and README.

There is a critical distinction between these two installation commands:

Command	What it does
`pip install -r requirements.txt`	Installs Python dependencies only. The `ssrl_ecg` package itself is not registered with Python’s import system.
`pip install -e .`	Installs dependencies and registers `ssrl_ecg` as an editable package, making `import ssrl_ecg` work from any working directory.

Always use pip install -e .. Running requirements.txt alone is the root cause of the majority of ModuleNotFoundError reports.

Before starting a long training run, verify GPU availability in three seconds:

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

If this prints CUDA available: False, all training will run on CPU and will be orders of magnitude slower. Resolve the CUDA issue first — see the “Slow training / CPU fallback” entry below.

Common Issues

ModuleNotFoundError: No module named 'ssrl_ecg'

Symptom

ModuleNotFoundError: No module named 'ssrl_ecg'

This error appears when you try to run any training or evaluation script without having installed the package in editable mode.Root causeRunning pip install -r requirements.txt installs third-party dependencies (PyTorch, NumPy, etc.) but does not register the local ssrl_ecg package on the Python path.FixFrom the repository root, run:

Linux / macOS
Windows (PowerShell)

pip install -e .

pip install -e .

The -e flag installs the package in editable (development) mode. Python resolves imports directly from the src/ directory, so any local edits take effect immediately without reinstalling.Verify the fix

python -c "import ssrl_ecg; print('Package found at:', ssrl_ecg.__file__)"

Tensor shape errors in augmentations

Symptom

RuntimeError: Expected 3D input, got 2D input of shape [12, 1000]

or similar shape mismatches when applying ECGAugmentations.Root causeECG data arrives in two shapes depending on context:

Single sample (e.g., during inference or manual testing): (channels, time) — a 2D tensor.
Batch (during training): (batch, channels, time) — a 3D tensor.

Older versions of the augmentation pipeline assumed 3D input and crashed on single samples.FixNo code change is required. The current ECGAugmentations class automatically detects input dimensionality and handles both cases:

2D input (channels, time) → internally promoted to (1, channels, time), augmented, then squeezed back to (channels, time).
3D input (batch, channels, time) → processed as-is.

The output shape always matches the input shape.If you see the error anyway, confirm that you are running the installed editable package and not a stale cached .pyc file:

pip install -e .  # re-install to pick up latest source

CUDA out of memory (OOM)

Symptom

torch.cuda.OutOfMemoryError: CUDA out of memory.
Tried to allocate ... MiB

Root causeThe default batch sizes (--batch-size 128 for SimCLR, --batch-size 256 for BYOL) are tuned for GPUs with ≥ 16 GB VRAM. Smaller cards will OOM during the forward pass.FixReduce the batch size. Start at 64 and halve again if necessary:

SimCLR
BYOL
Supervised / Fine-tune

python -m ssrl_ecg.train_ssl_simclr \
  --data-root data/PTB-XL \
  --epochs 20 \
  --batch-size 64 \
  --temperature 0.07 \
  --seed 42 \
  --out checkpoints/ssl_simclr_enhanced.pt

python -m ssrl_ecg.train_ssl_byol \
  --data-root data/PTB-XL \
  --epochs 30 \
  --batch-size 64 \
  --momentum-tau 0.99 \
  --seed 42 \
  --out checkpoints/ssl_byol_enhanced.pt

python -m ssrl_ecg.train_supervised \
  --data-root data/PTB-XL \
  --epochs 30 \
  --batch-size 32 \
  --label-fraction 0.1 \
  --seed 42 \
  --out checkpoints/supervised_focal_oversample.pt

Reducing batch size below 32 can destabilise contrastive learning (SimCLR, BYOL) because the in-batch negative pairs become too few. If 32 still causes OOM, consider gradient accumulation rather than further reducing the logical batch size.

choose_device() also pre-allocates 95 % of available GPU memory and clears the cache before training begins (torch.cuda.empty_cache()), which helps avoid fragmentation-related OOM on repeated short runs in the same process.

Slow training / CPU fallback

SymptomTraining appears to run but is extremely slow — epochs take minutes instead of seconds, or choose_device() prints [WARNING] No CUDA-capable GPU detected!.DiagnosisRun both checks before starting any long experiment:

# 1. Check GPU process utilisation in real time
nvidia-smi

# 2. Verify PyTorch can see a CUDA device
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Common causes and fixes

Cause	Fix
CUDA toolkit not installed or wrong version	Install CUDA matching your PyTorch build (`torch.__version__` shows the expected CUDA suffix, e.g., `+cu121`)
PyTorch CPU-only wheel installed	Reinstall with the correct CUDA wheel from pytorch.org
GPU in use by another process	Check `nvidia-smi` for competing processes and free VRAM
Virtual environment mismatch	Ensure the venv with `pip install -e .` is active

Expected choose_device() output when GPU is detected

[GPU CONFIGURATION]
  Device: NVIDIA GeForce RTX ...
  GPU Memory: 16.0 GB
  CUDA Version: 12.1
  cuDNN Version: 8902
  PyTorch Version: 2.x.x
  [STATUS] Using GPU mode - optimal for RTX 5070 Ti

If you see the CPU fallback message instead, address the CUDA installation before training.

Dataset FileNotFoundError

Symptom

FileNotFoundError: [Errno 2] No such file or directory: 'data/PTB-XL/ptbxl_database.csv'

Root causeThe data loader expects a specific folder structure under the path passed to --data-root. If the PTB-XL archive was extracted to a different location, or the CSV metadata files are missing, the dataset class cannot initialise.Required folder layout

data/
└── PTB-XL/
    ├── ptbxl_database.csv        ← required
    ├── scp_statements.csv        ← required
    ├── records100/
    │   ├── 00000/
    │   │   ├── *.hea
    │   │   └── *.dat
    │   └── ...
    └── records500/
        └── ...

Fix

Download PTB-XL from PhysioNet and extract it so that ptbxl_database.csv lives directly inside the folder you pass to --data-root.

Pass the correct path explicitly:

python -m ssrl_ecg.train_ssl_simclr \
  --data-root /path/to/your/PTB-XL \
  --epochs 20

Verify the structure before running training:
```
python -m ssrl_ecg.analyze_datasets --ptbxl-root data/PTB-XL
```
A successful run prints class distribution and split statistics, confirming the dataset is readable.

Checkpoint loading errors — wrong key

Symptom

RuntimeError: Error(s) in loading state_dict for ECGClassifier:
  Missing key(s) in state_dict: "encoder.conv1.weight", ...

KeyError: 'encoder'

Root causeSSRL-ECG uses two different checkpoint schemas depending on the training stage:

Script	Saved key	Contains
`train_ssl_simclr.py`, `train_ssl_byol.py`	`encoder`	Encoder weights only (no projection head)
`train_supervised.py`, `train_finetune.py`, `train_supervised_multiseed.py`	`model`	Full `ECGClassifier` state dict

Passing an SSL checkpoint to a script that expects a classifier checkpoint (or vice versa) raises a key error.FixInspect the checkpoint before loading:

import torch
ckpt = torch.load("checkpoints/my_checkpoint.pt", map_location="cpu")
print(ckpt.keys())  # should print dict_keys(['encoder']) or dict_keys(['model'])

Then load with the correct key:

# For SSL encoder checkpoints
encoder.load_state_dict(ckpt["encoder"])

# For classifier checkpoints (supervised / fine-tuned)
classifier.load_state_dict(ckpt["model"])

The fine-tuning script (train_finetune.py) specifically reads the encoder key from the SSL checkpoint and wraps it in a new ECGClassifier — it does not expect a model key at the input stage.

Windows vs Linux command syntax

SymptomMulti-line commands from the README fail on Windows with a parse error, or commands written for PowerShell fail on Linux/macOS.Root causeThe two platforms use different characters to continue a command across multiple lines:

Platform	Line-continuation character	Example
Linux / macOS (bash)	`\` (backslash)	`python script.py \`
Windows (PowerShell)	` (backtick)	python script.py `

Examples

Linux / macOS (bash)
Windows (PowerShell)
Single line (any platform)

python -m ssrl_ecg.train_ssl_simclr \
  --data-root data/PTB-XL \
  --epochs 20 \
  --batch-size 128 \
  --temperature 0.07 \
  --seed 42 \
  --out checkpoints/ssl_simclr_enhanced.pt

python -m ssrl_ecg.train_ssl_simclr `
  --data-root data/PTB-XL `
  --epochs 20 `
  --batch-size 128 `
  --temperature 0.07 `
  --seed 42 `
  --out checkpoints/ssl_simclr_enhanced.pt

python -m ssrl_ecg.train_ssl_simclr --data-root data/PTB-XL --epochs 20 --batch-size 128 --temperature 0.07 --seed 42 --out checkpoints/ssl_simclr_enhanced.pt

When in doubt, collapse the command to a single line (the “any platform” tab above). Single-line commands work identically on bash, zsh, and PowerShell.

Quick-Reference Diagnostic Commands

Check package installation

python -c "import ssrl_ecg; print(ssrl_ecg.__file__)"

If this raises ModuleNotFoundError, run pip install -e ..

Verify CUDA availability

python -c "import torch; print(torch.cuda.is_available())"

Should print True before starting any GPU training.

Inspect GPU utilisation

nvidia-smi

Confirms the GPU is visible to the OS and shows free VRAM.

Validate dataset layout

python -m ssrl_ecg.analyze_datasets \
  --ptbxl-root data/PTB-XL

Prints class distribution if the folder structure is correct.

Get Started

Concepts

Training

Evaluation & Analysis

Guides

Troubleshoot Common SSRL-ECG Installation and Runtime Issues

Common Issues

Quick-Reference Diagnostic Commands

Check package installation

Verify CUDA availability

Inspect GPU utilisation

Validate dataset layout

Build docs developers (and LLMs) love

Get Started

Concepts

Training

Evaluation & Analysis

Guides

Documentation Index

​Common Issues

​Quick-Reference Diagnostic Commands

Check package installation

Verify CUDA availability

Inspect GPU utilisation

Validate dataset layout

Build docs developers (and LLMs) love

Common Issues

Quick-Reference Diagnostic Commands