Snakemake profiles are directories containing aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
config.yaml that pre-configure the executor, resource defaults, and scheduler behaviour for a specific compute environment. Instead of passing dozens of flags on the command line, you select a profile with --profile profile/<name> and all settings load automatically. The BDB-Genomics ATAC-seq pipeline ships eight profiles covering every major execution target, from a laptop to multi-cloud Kubernetes clusters.
Every profile’s config keys override or extend Snakemake’s built-in defaults. Per-rule resource declarations in config.yaml (the pipeline config, not the profile config) always take precedence over a profile’s default-resources block, which acts only as a fallback for rules that do not declare their own requirements.
local
The local profile is the recommended starting point for development, debugging, and single-machine production runs.jobs: 8
Up to 8 rules execute in parallel. Tune this to match your machine’s logical CPU count.
keep-going: true
Independent branches of the DAG continue even if one rule fails, maximising throughput on partial failures.
rerun-incomplete: true
Output files from interrupted rules are automatically re-generated on the next run.
restart-times: 0
Failed rules are not automatically retried locally. Investigate logs before re-running.
slurm
The SLURM profile submits every rule as an independent batch job via the nativesnakemake-executor-plugin-slurm.
Selects the SLURM executor plugin. Requires
snakemake-executor-plugin-slurm to be installed in the Snakemake environment.Maximum number of concurrently queued or running SLURM jobs. Set this to stay within your cluster’s fair-share policy.
Seconds Snakemake waits for output files to appear on shared storage after a job completes. Increase this if your cluster uses a high-latency network filesystem (e.g., Lustre over WAN).
Default SLURM partition for all jobs. Override per-rule in
config.yaml resources if needed.SLURM account to charge compute time against. Change this to your institutional account name.
restart-times: 1 allows each SLURM job one automatic retry on failure — useful for transient scheduler preemptions. Do not set this higher without also checking latency-wait, as retries on NFS can cause false failures.low_resource
Designed for workstations with ≤ 8 GB RAM and ≤ 4 CPU cores. This profile caps memory and thread allocations for every named rule via Snakemake’sset-resources directive, preventing out-of-memory kills on constrained hardware.
jobs: 2 to prevent the two parallel jobs from simultaneously saturating available RAM. The default-resources fallback (mem_mb: 2000, threads: 1) applies to any rule not listed explicitly in set-resources.
test
The test profile is used in CI pipelines and for verifying the installation on synthetic or downsampled data. It loads its ownconfig_test.yaml which relaxes QC thresholds for synthetic reads.
aws
Runs the pipeline on AWS Batch via the Tibanna executor plugin, with intermediate files stored in Amazon S3.Update the profile config
Replace
YOUR_S3_BUCKET in profile/aws/config.yaml with your actual bucket name.use-singularity: true is required because AWS Batch jobs run in container-isolated environments. Conda alone is insufficient for container-native execution.gcp
Runs the pipeline on Google Cloud Life Sciences, with intermediate files in Google Cloud Storage.Update the profile config
Replace
YOUR_GCP_PROJECT_ID and YOUR_GCS_BUCKET in profile/gcp/config.yaml.Default GCE machine type. The
n1-standard-4 provides 4 vCPUs and 15 GB RAM. Increase to n1-highmem-8 for memory-intensive rules like ArchR.GCP region where Life Sciences pipelines execute. Choose a region close to your GCS bucket to minimise egress costs.
azure
Runs the pipeline on Azure Batch, with intermediate files in Azure Blob Storage.Update the profile config
Replace
YOUR_BATCH_ACCOUNT and YOUR_BLOB_CONTAINER in profile/azure/config.yaml.kubernetes
Runs each rule as a Kubernetes Pod on any conformant cluster (GKE, EKS, AKS, or local Minikube).Create a cloud storage bucket and update the profile config
Replace
YOUR_BUCKET and set the correct default-remote-provider (GS, S3, or AzBlob) in profile/kubernetes/config.yaml.Profile Comparison
local
jobs: 8 · executor: default (process fork) · restart-times: 0 · Best for: single-machine runs and development.
slurm
jobs: 100 · executor: slurm · restart-times: 1 · latency-wait: 60 s · Best for: university HPC clusters.
low_resource
jobs: 2 · executor: default · All rules capped at ≤ 4 GB RAM. Best for: laptops and constrained VMs.
test
jobs: 4 · configfile:
config_test.yaml · Best for: CI pipelines and installation checks.aws
jobs: 50 · executor: tibanna · provider: S3 · Best for: AWS Batch scale-out.
gcp
jobs: 50 · executor: google-lifesciences · provider: GS · Best for: Google Cloud Life Sciences.
azure
jobs: 50 · executor: azure-batch · provider: AzBlob · Best for: Azure Batch compute pools.
kubernetes
jobs: 50 · executor: kubernetes · provider: GS/S3/AzBlob · Best for: container-native, multi-cloud.