Reproducing deep learning results for medical signal processing requires deliberate control over every source of randomness. SSRL-ECG provides first-class tooling for this: a unifiedDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Tumo505/SSL-for-ECG-classification/llms.txt
Use this file to discover all available pages before exploring further.
set_seed() helper, a canonical list of ten evaluation seeds, confidence-interval computation from multi-seed runs, and statistical significance scripts — all of which together let you replicate the published SimCLR AUROC of 0.8717 ± 0.0032 down to the fourth decimal place.
Seeding Strategy
All randomness in the pipeline flows through a single function,set_seed, defined in ssrl_ecg/utils.py. Calling it once before any data loading or model initialization ensures that Python’s built-in RNG, NumPy, PyTorch (CPU and all CUDA devices) are all synchronised to the same state.
| RNG | Call |
|---|---|
| Python stdlib | random.seed(seed) |
| NumPy | np.random.seed(seed) |
| PyTorch CPU | torch.manual_seed(seed) |
| PyTorch CUDA (all GPUs) | torch.cuda.manual_seed_all(seed) |
Even with Expect a moderate throughput reduction (~10–20 %) when deterministic mode
is enabled.
set_seed() called, SSRL-ECG configures cuDNN with
torch.backends.cudnn.benchmark = True and
torch.backends.cudnn.deterministic = False inside choose_device().
The benchmark mode selects the fastest convolution algorithm for your
hardware, which may vary between runs, so results can differ at the last
decimal place across machines or GPU generations. To achieve
fully bit-for-bit reproducibility, set both flags yourself after calling
choose_device():Multi-Seed Validation
A single-seed result can be misleading due to lucky or unlucky weight initialisation. SSRL-ECG validates all published numbers across ten fixed seeds to produce stable confidence intervals.Running the Multi-Seed SimCLR Experiment
- Linux / macOS
- Linux / macOS (quick — 3 seeds)
- Windows (PowerShell)
set_seed(seed) at the top of every training run, saves a per-seed checkpoint under checkpoints/multiseed_<loss>_<strategy>_seed<NNN>.pt, and writes aggregate statistics to results/phase2_multiseed_results.json. Pass --quick to do a fast 3-seed, 5-epoch smoke-test instead of the full 10-seed, 30-epoch run.
Published Multi-Seed Results
| Metric | Mean ± Std | 95 % CI |
|---|---|---|
| AUROC (macro) | 0.8717 ± 0.0032 | 0.8671 – 0.8763 |
| F1 (macro) | 0.6448 ± 0.0181 | — |
Checkpoint Saving and Loading
SSL Pretraining — saves `encoder` key
The SimCLR and BYOL pretraining scripts save only the encoder weights so
that the projection head (used exclusively during contrastive training) is
not bundled with the checkpoint:To reload for fine-tuning:
Supervised / Fine-tune — saves `model` key
The supervised baseline and the linear-probing fine-tune scripts save the
full To reload for evaluation:
ECGClassifier state under the model key:Statistical Significance Testing
After collecting multi-seed results, usescripts/statistical_tests.py to compare methods and compute effect sizes. The script accepts --results-dir, --baseline (supervised or ssl), --alpha, and --output-dir, then initialises a StatisticalTester instance ready for comparison calls.
StatisticalTester class exposes a compare_methods() helper that, for each metric:
- Runs a Shapiro-Wilk normality test on each group of scores.
- Selects a paired t-test when both groups are normally distributed, or a Mann-Whitney U test otherwise.
- Computes Cohen’s d effect size (or rank-biserial correlation for the non-parametric path).
- Returns a structured result dict; call
create_comparison_plots()to write the significance-comparison plot to--output-dir.
Paired t-test
Used when both distributions pass Shapiro-Wilk (p > α). Reports t-statistic, p-value, Cohen’s d, and the mean difference with 95 % CI.
Mann-Whitney U
Used as the non-parametric fallback. Reports U-statistic, p-value, rank-biserial r, and median difference.
Ablation Experiments
Retrain with Enhanced Augmentations
scripts/retrain_with_enhanced_augmentations.py orchestrates a four-step pipeline — BYOL pretraining, SimCLR pretraining, BYOL fine-tune, SimCLR fine-tune — in a single invocation:
Analyse Retraining Strategy
scripts/analyze_retraining_strategy.py compares epoch counts and augmentation sets without running full training. It prints the recommended command for each experiment and writes a structured results/retraining_recommendations.json:
- BYOL only
- Supervised only
- All experiments
What does retraining_recommendations.json contain?
What does retraining_recommendations.json contain?
The file has three top-level keys:
epochs_recommendation— per-model advice (e.g., increase BYOL from 20 to 30 epochs, reduce supervised from 30 to 20 to avoid overfitting).augmentation_recommendation— lists the basic augmentations to replace and the domain-adaptive augmentations to add, with an expected +2–5 % AUROC improvement.experiments_to_run— an array of objects, each withname,command,checkpointpath, andexpected_improvement.
End-to-End Reproducibility Checklist
Run statistical tests on collected results
StatisticalTester with your chosen significance level and output directory. Load your per-seed metric arrays and call tester.compare_methods() to run Shapiro-Wilk normality checks and automatically select between a paired t-test (normal data) and Mann-Whitney U (non-normal data), then write the formatted report and comparison plot to --output-dir.