Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt

Use this file to discover all available pages before exploring further.

The BDB-Genomics ATAC-seq pipeline ships four cloud-ready Snakemake profiles that map rule execution to managed compute services on AWS, GCP, Azure, and Kubernetes. Each profile installs the appropriate Snakemake executor plugin, routes intermediate files through cloud object storage, and allows Snakemake’s per-rule resource blocks to drive instance selection automatically. All profiles share the same config.yaml and sample sheet format as local or SLURM execution — switching targets is a single --profile flag change.
Cloud execution incurs real costs. Review your provider’s pricing before submitting large runs. Always delete compute resources and object storage after the pipeline completes to stop accruing charges.

AWS Batch + Tibanna

The AWS profile routes jobs through Tibanna, a serverless workflow manager that translates Snakemake resource requests into AWS Batch job definitions and submits them to the appropriate EC2 instance types.

Profile

# profile/aws/config.yaml

executor: tibanna
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1

latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

# Tibanna settings (replace with your values)
tibanna-sfn: "tibanna_unicorn_atacseq"
default-remote-prefix: "YOUR_S3_BUCKET/atacseq-pipeline"
default-remote-provider: "S3"

Prerequisites

1
Install the executor plugin and Tibanna
2
pip install snakemake-executor-plugin-tibanna
pip install tibanna
3
Configure AWS credentials
4
aws configure
# Enter: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, region, output format
5
Alternatively, set environment variables directly:
6
export AWS_ACCESS_KEY_ID=your_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_key
7
Create an S3 bucket for intermediate storage
8
aws s3 mb s3://your-bucket-name
9
Deploy the Tibanna unicorn (one-time setup per AWS account)
10
tibanna deploy_unicorn -g atacseq -b your-bucket-name
11
Update the profile with your bucket name
12
Edit profile/aws/config.yaml and replace YOUR_S3_BUCKET with the bucket you created:
13
tibanna-sfn: "tibanna_unicorn_atacseq"
default-remote-prefix: "your-bucket-name/atacseq-pipeline"
default-remote-provider: "S3"

Running on AWS

snakemake --profile profile/aws
For scATAC mode:
ATAC_MODE=scatac snakemake --profile profile/aws

Google Cloud Life Sciences

The GCP profile uses the Google Life Sciences API (formerly Pipelines API) executor to run each Snakemake rule as a containerised virtual machine job on Google Compute Engine.

Profile

# profile/gcp/config.yaml

executor: google-lifesciences
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1

latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1
  machine_type: "n1-standard-4"

# Google Cloud settings (replace with your values)
google-lifesciences-project: "YOUR_GCP_PROJECT_ID"
google-lifesciences-region: "us-central1"
default-remote-prefix: "YOUR_GCS_BUCKET/atacseq-pipeline"
default-remote-provider: "GS"

Prerequisites

1
Install the executor plugin
2
pip install snakemake-executor-plugin-google-lifesciences
3
Authenticate with Google Cloud
4
gcloud auth application-default login
5
Create a GCS bucket for intermediate storage
6
gsutil mb gs://your-bucket-name
7
Update the profile with your project and bucket
8
Edit profile/gcp/config.yaml and replace the placeholder values:
9
google-lifesciences-project: "my-gcp-project-id"
google-lifesciences-region: "us-central1"
default-remote-prefix: "your-bucket-name/atacseq-pipeline"
default-remote-provider: "GS"

Running on GCP

snakemake --profile profile/gcp
New Google Cloud accounts receive $300 of free credit valid for 90 days. The always-free tier includes one f1-micro instance but this is insufficient for ATAC-seq alignment. Use the free credit for initial testing, then upgrade to a paid account for production runs.

Azure Batch

The Azure profile submits Snakemake rules as Azure Batch tasks backed by a managed pool of virtual machines. Intermediate files are stored in Azure Blob Storage.

Profile

# profile/azure/config.yaml

executor: azure-batch
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1

latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

# Azure Batch settings (replace with your values)
az-batch-account-url: "https://YOUR_BATCH_ACCOUNT.eastus.batch.azure.com"
default-remote-prefix: "YOUR_BLOB_CONTAINER/atacseq-pipeline"
default-remote-provider: "AzBlob"

Prerequisites

1
Install the executor plugin
2
pip install snakemake-executor-plugin-azure-batch
3
Authenticate with Azure
4
az login
5
Create a Batch account and a storage account
6
az batch account create \
  --name atacseqbatch \
  --resource-group YOUR_RG \
  --location eastus

az storage account create \
  --name atacseqstorage \
  --resource-group YOUR_RG \
  --location eastus
7
Create a Blob container for intermediate storage
8
az storage container create \
  --name atacseq-pipeline \
  --account-name atacseqstorage
9
Update the profile with your account details
10
Edit profile/azure/config.yaml and replace the placeholder values:
11
az-batch-account-url: "https://atacseqbatch.eastus.batch.azure.com"
default-remote-prefix: "atacseq-pipeline/atacseq-pipeline"
default-remote-provider: "AzBlob"

Running on Azure

snakemake --profile profile/azure
Azure offers $200 of free credit for new accounts (30-day validity). Delete Batch pools and storage containers after the run to avoid ongoing charges.

Kubernetes

The Kubernetes profile uses the Snakemake Kubernetes executor plugin to schedule each Snakemake rule as a Kubernetes Pod. This profile works with any Kubernetes distribution — GKE, EKS, AKS, or a local Minikube cluster.

Profile

# profile/kubernetes/config.yaml

executor: kubernetes
use-conda: true
use-singularity: true
jobs: 50
printshellcmds: true
show-failed-logs: true
keep-going: true
rerun-incomplete: true
restart-times: 1

latency-wait: 120

default-resources:
  mem_mb: 4000
  time: 60
  threads: 1

# Kubernetes settings (replace with your values)
kubernetes-namespace: "default"
default-remote-prefix: "YOUR_BUCKET/atacseq-pipeline"
default-remote-provider: "GS"       # or "S3" for AWS, "AzBlob" for Azure

Prerequisites

1
Install the executor plugin
2
pip install snakemake-executor-plugin-kubernetes
3
Create or connect to a Kubernetes cluster
4
GKE (Google)
gcloud container clusters create atacseq-cluster --num-nodes=2
EKS (AWS)
eksctl create cluster --name atacseq-cluster --nodes=2
Minikube (local)
minikube start --memory=8192 --cpus=4
5
Verify kubectl context
6
kubectl config current-context
7
Create a cloud storage bucket and update the profile
8
The Kubernetes executor uses cloud object storage for intermediate files. Set default-remote-prefix and default-remote-provider to match your provider:
9
# For GCS:
default-remote-prefix: "your-bucket/atacseq-pipeline"
default-remote-provider: "GS"

# For S3:
default-remote-prefix: "your-bucket/atacseq-pipeline"
default-remote-provider: "S3"

# For Azure Blob:
default-remote-prefix: "your-container/atacseq-pipeline"
default-remote-provider: "AzBlob"

Running on Kubernetes

snakemake --profile profile/kubernetes
Always delete the cluster after the pipeline completes to stop VM billing:
# GKE
gcloud container clusters delete atacseq-cluster

# EKS
eksctl delete cluster --name atacseq-cluster

Cloud Profile Comparison

ProfileExecutor PluginStorageTypical Use Case
profile/awssnakemake-executor-plugin-tibannaS3Serverless AWS, pay-per-invocation
profile/gcpsnakemake-executor-plugin-google-lifesciencesGCSGCP-native managed VMs
profile/azuresnakemake-executor-plugin-azure-batchAzure BlobAzure-native batch pools
profile/kubernetessnakemake-executor-plugin-kubernetesAny (GS/S3/AzBlob)Multi-cloud or on-premises K8s

Dynamic Resource Scaling

All cloud profiles use default-resources as fallback values only. The per-rule resource lambdas defined in the Snakemake rules scale memory with input file size and multiply resources on retry:
# Example from chromap.smk
resources:
    mem_mb=lambda wildcards, input, attempt: max(config['chromap']['resources']['mem_mb'], int(input.size_mb * 1.5)) * attempt,
    time=lambda wildcards, attempt: config['chromap']['resources']['time'] * attempt,
This means a rule that requires 32 GB on the first attempt automatically requests 64 GB if it is retried — without any manual intervention.

Build docs developers (and LLMs) love