Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
The BDB-Genomics ATAC-seq pipeline ships four cloud-ready Snakemake profiles that map rule execution to managed compute services on AWS, GCP, Azure, and Kubernetes. Each profile installs the appropriate Snakemake executor plugin, routes intermediate files through cloud object storage, and allows Snakemake’s per-rule resource blocks to drive instance selection automatically. All profiles share the same config.yaml and sample sheet format as local or SLURM execution — switching targets is a single --profile flag change.
Cloud execution incurs real costs. Review your provider’s pricing before submitting large runs. Always delete compute resources and object storage after the pipeline completes to stop accruing charges.
AWS Batch + Tibanna
The AWS profile routes jobs through Tibanna, a serverless workflow manager that translates Snakemake resource requests into AWS Batch job definitions and submits them to the appropriate EC2 instance types.
Profile
# profile/aws/config.yaml
executor: tibanna
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120
default-resources:
mem_mb: 4000
time: 60
threads: 1
# Tibanna settings (replace with your values)
tibanna-sfn: "tibanna_unicorn_atacseq"
default-remote-prefix: "YOUR_S3_BUCKET/atacseq-pipeline"
default-remote-provider: "S3"
Prerequisites
Install the executor plugin and Tibanna
pip install snakemake-executor-plugin-tibanna
pip install tibanna
aws configure
# Enter: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, region, output format
Alternatively, set environment variables directly:
export AWS_ACCESS_KEY_ID=your_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_key
aws s3 mb s3://your-bucket-name
Deploy the Tibanna unicorn (one-time setup per AWS account)
tibanna deploy_unicorn -g atacseq -b your-bucket-name
Update the profile with your bucket name
Edit profile/aws/config.yaml and replace YOUR_S3_BUCKET with the bucket you created:
tibanna-sfn: "tibanna_unicorn_atacseq"
default-remote-prefix: "your-bucket-name/atacseq-pipeline"
default-remote-provider: "S3"
Running on AWS
snakemake --profile profile/aws
For scATAC mode:
ATAC_MODE=scatac snakemake --profile profile/aws
Google Cloud Life Sciences
The GCP profile uses the Google Life Sciences API (formerly Pipelines API) executor to run each Snakemake rule as a containerised virtual machine job on Google Compute Engine.
Profile
# profile/gcp/config.yaml
executor: google-lifesciences
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120
default-resources:
mem_mb: 4000
time: 60
threads: 1
machine_type: "n1-standard-4"
# Google Cloud settings (replace with your values)
google-lifesciences-project: "YOUR_GCP_PROJECT_ID"
google-lifesciences-region: "us-central1"
default-remote-prefix: "YOUR_GCS_BUCKET/atacseq-pipeline"
default-remote-provider: "GS"
Prerequisites
Install the executor plugin
pip install snakemake-executor-plugin-google-lifesciences
Authenticate with Google Cloud
gcloud auth application-default login
gsutil mb gs://your-bucket-name
Update the profile with your project and bucket
Edit profile/gcp/config.yaml and replace the placeholder values:
google-lifesciences-project: "my-gcp-project-id"
google-lifesciences-region: "us-central1"
default-remote-prefix: "your-bucket-name/atacseq-pipeline"
default-remote-provider: "GS"
Running on GCP
snakemake --profile profile/gcp
New Google Cloud accounts receive $300 of free credit valid for 90 days. The always-free tier includes one f1-micro instance but this is insufficient for ATAC-seq alignment. Use the free credit for initial testing, then upgrade to a paid account for production runs.
Azure Batch
The Azure profile submits Snakemake rules as Azure Batch tasks backed by a managed pool of virtual machines. Intermediate files are stored in Azure Blob Storage.
Profile
# profile/azure/config.yaml
executor: azure-batch
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120
default-resources:
mem_mb: 4000
time: 60
threads: 1
# Azure Batch settings (replace with your values)
az-batch-account-url: "https://YOUR_BATCH_ACCOUNT.eastus.batch.azure.com"
default-remote-prefix: "YOUR_BLOB_CONTAINER/atacseq-pipeline"
default-remote-provider: "AzBlob"
Prerequisites
Install the executor plugin
pip install snakemake-executor-plugin-azure-batch
Create a Batch account and a storage account
az batch account create \
--name atacseqbatch \
--resource-group YOUR_RG \
--location eastus
az storage account create \
--name atacseqstorage \
--resource-group YOUR_RG \
--location eastus
az storage container create \
--name atacseq-pipeline \
--account-name atacseqstorage
Update the profile with your account details
Edit profile/azure/config.yaml and replace the placeholder values:
az-batch-account-url: "https://atacseqbatch.eastus.batch.azure.com"
default-remote-prefix: "atacseq-pipeline/atacseq-pipeline"
default-remote-provider: "AzBlob"
Running on Azure
snakemake --profile profile/azure
Azure offers $200 of free credit for new accounts (30-day validity). Delete Batch pools and storage containers after the run to avoid ongoing charges.
Kubernetes
The Kubernetes profile uses the Snakemake Kubernetes executor plugin to schedule each Snakemake rule as a Kubernetes Pod. This profile works with any Kubernetes distribution — GKE, EKS, AKS, or a local Minikube cluster.
Profile
# profile/kubernetes/config.yaml
executor: kubernetes
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1
latency-wait: 120
default-resources:
mem_mb: 4000
time: 60
threads: 1
# Kubernetes settings (replace with your values)
kubernetes-namespace: "default"
default-remote-prefix: "YOUR_BUCKET/atacseq-pipeline"
default-remote-provider: "GS" # or "S3" for AWS, "AzBlob" for Azure
Prerequisites
Install the executor plugin
pip install snakemake-executor-plugin-kubernetes
Create or connect to a Kubernetes cluster
GKE (Google)
gcloud container clusters create atacseq-cluster --num-nodes=2
EKS (AWS)
eksctl create cluster --name atacseq-cluster --nodes=2
Minikube (local)
minikube start --memory=8192 --cpus=4
kubectl config current-context
Create a cloud storage bucket and update the profile
The Kubernetes executor uses cloud object storage for intermediate files. Set default-remote-prefix and default-remote-provider to match your provider:
# For GCS:
default-remote-prefix: "your-bucket/atacseq-pipeline"
default-remote-provider: "GS"
# For S3:
default-remote-prefix: "your-bucket/atacseq-pipeline"
default-remote-provider: "S3"
# For Azure Blob:
default-remote-prefix: "your-container/atacseq-pipeline"
default-remote-provider: "AzBlob"
Running on Kubernetes
snakemake --profile profile/kubernetes
Always delete the cluster after the pipeline completes to stop VM billing:# GKE
gcloud container clusters delete atacseq-cluster
# EKS
eksctl delete cluster --name atacseq-cluster
Cloud Profile Comparison
| Profile | Executor Plugin | Storage | Typical Use Case |
|---|
profile/aws | snakemake-executor-plugin-tibanna | S3 | Serverless AWS, pay-per-invocation |
profile/gcp | snakemake-executor-plugin-google-lifesciences | GCS | GCP-native managed VMs |
profile/azure | snakemake-executor-plugin-azure-batch | Azure Blob | Azure-native batch pools |
profile/kubernetes | snakemake-executor-plugin-kubernetes | Any (GS/S3/AzBlob) | Multi-cloud or on-premises K8s |
Dynamic Resource Scaling
All cloud profiles use default-resources as fallback values only. The per-rule resource lambdas defined in the Snakemake rules scale memory with input file size and multiply resources on retry:
# Example from chromap.smk
resources:
mem_mb=lambda wildcards, input, attempt: max(config['chromap']['resources']['mem_mb'], int(input.size_mb * 1.5)) * attempt,
time=lambda wildcards, attempt: config['chromap']['resources']['time'] * attempt,
This means a rule that requires 32 GB on the first attempt automatically requests 64 GB if it is retried — without any manual intervention.