Snakemake Execution Profiles: Local, HPC, and Cloud

Snakemake profiles are directories containing a config.yaml that pre-configure the executor, resource defaults, and scheduler behaviour for a specific compute environment. Instead of passing dozens of flags on the command line, you select a profile with --profile profile/<name> and all settings load automatically. The BDB-Genomics ATAC-seq pipeline ships eight profiles covering every major execution target, from a laptop to multi-cloud Kubernetes clusters. Every profile’s config keys override or extend Snakemake’s built-in defaults. Per-rule resource declarations in config.yaml (the pipeline config, not the profile config) always take precedence over a profile’s default-resources block, which acts only as a fallback for rules that do not declare their own requirements.

local

The local profile is the recommended starting point for development, debugging, and single-machine production runs.

snakemake --profile profile/local

# profile/local/config.yaml
use-conda: true
jobs: 8
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 0

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

jobs: 8

Up to 8 rules execute in parallel. Tune this to match your machine’s logical CPU count.

keep-going: true

Independent branches of the DAG continue even if one rule fails, maximising throughput on partial failures.

rerun-incomplete: true

Output files from interrupted rules are automatically re-generated on the next run.

restart-times: 0

Failed rules are not automatically retried locally. Investigate logs before re-running.

slurm

The SLURM profile submits every rule as an independent batch job via the native snakemake-executor-plugin-slurm.

snakemake --profile profile/slurm

# profile/slurm/config.yaml
executor: slurm
use-conda: true
jobs: 100
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1

latency-wait: 60

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1
  slurm_partition: "standard"
  slurm_account: "bdb_genomics"

executor

string

default:"slurm"

Selects the SLURM executor plugin. Requires snakemake-executor-plugin-slurm to be installed in the Snakemake environment.

jobs

integer

default:"100"

Maximum number of concurrently queued or running SLURM jobs. Set this to stay within your cluster’s fair-share policy.

latency-wait

integer

default:"60"

Seconds Snakemake waits for output files to appear on shared storage after a job completes. Increase this if your cluster uses a high-latency network filesystem (e.g., Lustre over WAN).

default-resources.slurm_partition

string

default:"standard"

Default SLURM partition for all jobs. Override per-rule in config.yaml resources if needed.

default-resources.slurm_account

string

default:"bdb_genomics"

SLURM account to charge compute time against. Change this to your institutional account name.

restart-times: 1 allows each SLURM job one automatic retry on failure — useful for transient scheduler preemptions. Do not set this higher without also checking latency-wait, as retries on NFS can cause false failures.

low_resource

Designed for workstations with ≤ 8 GB RAM and ≤ 4 CPU cores. This profile caps memory and thread allocations for every named rule via Snakemake’s set-resources directive, preventing out-of-memory kills on constrained hardware.

snakemake --profile profile/low_resource

# profile/low_resource/config.yaml
use-conda: true
jobs: 2
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 0
latency-wait: 30

set-resources:
  bowtie2_align:
    mem_mb: 4000
    threads: 2
  samtools_sort:
    mem_mb: 3000
    threads: 2
  samtools_markdup:
    mem_mb: 4000
    threads: 2
  tn5_shift:
    mem_mb: 3000
    threads: 2
  macs2_peak_calling:
    mem_mb: 4000
    threads: 2
  tss_enrichment:
    mem_mb: 4000
    threads: 2
  heatmap:
    mem_mb: 4000
    threads: 2
  peak_annotation:
    mem_mb: 4000
    threads: 2
  motif_analysis:
    mem_mb: 4000
    threads: 2
  differential_accessibility:
    mem_mb: 4000
    threads: 2
  chromvar_analysis:
    mem_mb: 4000
    threads: 2
  footprinting:
    mem_mb: 4000
    threads: 2
  tobias_atacorrect:
    mem_mb: 4000
    threads: 2
  tobias_score_bigwig:
    mem_mb: 4000
    threads: 2
  tobias_bindetect:
    mem_mb: 4000
    threads: 2
  # ... (all rules explicitly capped — see profile file for the complete list)

default-resources:
  mem_mb: 2000
  time: 120
  threads: 1

Rules that declare mem_mb: 16000 in config.yaml (e.g., bowtie2, macs2, heatmap) will be overridden to 4 GB by this profile. Alignment of large genomes may fail or produce incomplete results. This profile is intended for testing and development only.

The profile sets jobs: 2 to prevent the two parallel jobs from simultaneously saturating available RAM. The default-resources fallback (mem_mb: 2000, threads: 1) applies to any rule not listed explicitly in set-resources.

test

The test profile is used in CI pipelines and for verifying the installation on synthetic or downsampled data. It loads its own config_test.yaml which relaxes QC thresholds for synthetic reads.

snakemake --profile profile/test

# profile/test/config.yaml
use-conda: true
jobs: 4
printshellcmds: true
show-failed-logs: true
rerun-incomplete: true
restart-times: 0
configfile: "profile/test/config_test.yaml"

default-resources:
  mem_mb: 2000
  time: 30
  threads: 2

The configfile: "profile/test/config_test.yaml" directive automatically overlays the test-specific overrides (such as relaxed qc_gate thresholds) on top of the main config.yaml. You do not need to pass a second --configfile argument when using this profile.

aws

Runs the pipeline on AWS Batch via the Tibanna executor plugin, with intermediate files stored in Amazon S3.

snakemake --profile profile/aws

# profile/aws/config.yaml
executor: tibanna
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

tibanna-sfn: "tibanna_unicorn_atacseq"
default-remote-prefix: "YOUR_S3_BUCKET/atacseq-pipeline"
default-remote-provider: "S3"

Install the executor plugin

pip install snakemake-executor-plugin-tibanna tibanna

Configure AWS credentials

aws configure
# or export AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

Deploy Tibanna unicorn (one-time)

tibanna deploy_unicorn -g atacseq -b YOUR_S3_BUCKET

Create an S3 bucket for intermediate storage

aws s3 mb s3://your-bucket-name

Update the profile config

Replace YOUR_S3_BUCKET in profile/aws/config.yaml with your actual bucket name.

Run the pipeline

snakemake --profile profile/aws

use-singularity: true is required because AWS Batch jobs run in container-isolated environments. Conda alone is insufficient for container-native execution.

gcp

Runs the pipeline on Google Cloud Life Sciences, with intermediate files in Google Cloud Storage.

snakemake --profile profile/gcp

# profile/gcp/config.yaml
executor: google-lifesciences
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1
  machine_type: "n1-standard-4"

google-lifesciences-project: "YOUR_GCP_PROJECT_ID"
google-lifesciences-region: "us-central1"
default-remote-prefix: "YOUR_GCS_BUCKET/atacseq-pipeline"
default-remote-provider: "GS"

Install the executor plugin

pip install snakemake-executor-plugin-google-lifesciences

Authenticate with Google Cloud

gcloud auth application-default login

Create a GCS bucket

gsutil mb gs://your-bucket-name

Update the profile config

Replace YOUR_GCP_PROJECT_ID and YOUR_GCS_BUCKET in profile/gcp/config.yaml.

Run the pipeline

snakemake --profile profile/gcp

default-resources.machine_type

string

default:"n1-standard-4"

Default GCE machine type. The n1-standard-4 provides 4 vCPUs and 15 GB RAM. Increase to n1-highmem-8 for memory-intensive rules like ArchR.

google-lifesciences-region

string

default:"us-central1"

GCP region where Life Sciences pipelines execute. Choose a region close to your GCS bucket to minimise egress costs.

azure

Runs the pipeline on Azure Batch, with intermediate files in Azure Blob Storage.

snakemake --profile profile/azure

# profile/azure/config.yaml
executor: azure-batch
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

az-batch-account-url: "https://YOUR_BATCH_ACCOUNT.eastus.batch.azure.com"
default-remote-prefix: "YOUR_BLOB_CONTAINER/atacseq-pipeline"
default-remote-provider: "AzBlob"

Install the executor plugin

pip install snakemake-executor-plugin-azure-batch

Authenticate with Azure

az login

Create a Batch account and storage account

az batch account create \
  --name atacseqbatch \
  --resource-group YOUR_RG \
  --location eastus

az storage account create \
  --name atacseqstorage \
  --resource-group YOUR_RG \
  --location eastus

Create a Blob container

az storage container create \
  --name atacseq-pipeline \
  --account-name atacseqstorage

Update the profile config

Replace YOUR_BATCH_ACCOUNT and YOUR_BLOB_CONTAINER in profile/azure/config.yaml.

Run the pipeline

snakemake --profile profile/azure

kubernetes

Runs each rule as a Kubernetes Pod on any conformant cluster (GKE, EKS, AKS, or local Minikube).

snakemake --profile profile/kubernetes

# profile/kubernetes/config.yaml
executor: kubernetes
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

kubernetes-namespace: "default"
default-remote-prefix: "YOUR_BUCKET/atacseq-pipeline"
default-remote-provider: "GS"       # or "S3" for AWS, "AzBlob" for Azure

Install the executor plugin

pip install snakemake-executor-plugin-kubernetes

Provision a cluster (example: GKE)

gcloud container clusters create atacseq-cluster --num-nodes=2
# EKS: eksctl create cluster --name atacseq-cluster --nodes=2
# Local: minikube start --memory=8192 --cpus=4

Verify kubectl context

kubectl config current-context

Create a cloud storage bucket and update the profile config

Replace YOUR_BUCKET and set the correct default-remote-provider (GS, S3, or AzBlob) in profile/kubernetes/config.yaml.

Run the pipeline

snakemake --profile profile/kubernetes

Always delete your cluster after the run to avoid ongoing infrastructure costs. For GKE: gcloud container clusters delete atacseq-cluster. For EKS: eksctl delete cluster --name atacseq-cluster.

Profile Comparison

local

jobs: 8 · executor: default (process fork) · restart-times: 0 · Best for: single-machine runs and development.

slurm

jobs: 100 · executor: slurm · restart-times: 1 · latency-wait: 60 s · Best for: university HPC clusters.

low_resource

jobs: 2 · executor: default · All rules capped at ≤ 4 GB RAM. Best for: laptops and constrained VMs.

test

jobs: 4 · configfile: config_test.yaml · Best for: CI pipelines and installation checks.

aws

jobs: 50 · executor: tibanna · provider: S3 · Best for: AWS Batch scale-out.

gcp

jobs: 50 · executor: google-lifesciences · provider: GS · Best for: Google Cloud Life Sciences.

azure

jobs: 50 · executor: azure-batch · provider: AzBlob · Best for: Azure Batch compute pools.

kubernetes

jobs: 50 · executor: kubernetes · provider: GS/S3/AzBlob · Best for: container-native, multi-cloud.

Get Started

Configuration

Pipeline Stages

Modalities

Deployment

Guides

Snakemake Execution Profiles: Local, HPC, and Cloud

local

jobs: 8

keep-going: true

rerun-incomplete: true

restart-times: 0

slurm

low_resource

test

aws

gcp

azure

kubernetes

Profile Comparison

local

slurm

low_resource

test

aws

gcp

azure

kubernetes

Build docs developers (and LLMs) love

Get Started

Configuration

Pipeline Stages

Modalities

Deployment

Guides

Documentation Index

​local

jobs: 8

keep-going: true

rerun-incomplete: true

restart-times: 0

​slurm

​low_resource

​test

​aws

​gcp

​azure

​kubernetes

​Profile Comparison

local

slurm

low_resource

test

aws

gcp

azure

kubernetes

Build docs developers (and LLMs) love

local

slurm

low_resource

test

aws

gcp

azure

kubernetes

Profile Comparison