Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt

Use this file to discover all available pages before exploring further.

Docker is the recommended deployment path for macOS and Windows hosts, or for any Linux environment where installing Conda and Singularity system-wide is impractical. The repository ships a Dockerfile that builds a lightweight host runner image based on mambaorg/micromamba:1.5-bullseye-slim. This image contains only Snakemake and Python — individual rule dependencies (Bowtie2, MACS2, ArchR, and so on) are still downloaded and cached dynamically by Snakemake at runtime through its --use-conda mechanism, exactly as they are in a bare-metal Conda installation.

What the Dockerfile Does

# Base image: micromamba on Debian Bullseye slim
FROM mambaorg/micromamba:1.5-bullseye-slim

LABEL maintainer="Himanshu Bhandary <2032ushimanshu@gmail.com>"
LABEL description="Host runner environment for BDB-Genomics ATAC-seq Pipeline"

WORKDIR /app

# Copy the host-runner environment file and install Snakemake + Python.
# This layer is cached separately from the pipeline code, so changing a
# script later does not re-trigger a full Conda solve.
COPY --chown=$MAMBA_USER:$MAMBA_USER envs/main.yaml /tmp/env.yaml

RUN micromamba install -y -n base -f /tmp/env.yaml && \
    micromamba clean --all --yes

# Copy the rest of the pipeline code into the container
COPY --chown=$MAMBA_USER:$MAMBA_USER . /app

ENV PATH="/opt/conda/bin:$PATH"

# ENTRYPOINT wraps snakemake so that all arguments passed to `docker run`
# are forwarded directly to the snakemake CLI.
ENTRYPOINT ["/usr/local/bin/_entrypoint.sh", "snakemake"]

# Show help if no arguments are provided
CMD ["--help"]
envs/main.yaml provides the host runner dependencies only — Snakemake and Python. It does not bundle Bowtie2, MACS2, ArchR, or any analysis tool. Those are resolved by Snakemake at job execution time via --use-conda.

Step-by-Step Setup

1
Build the Host Runner Image
2
Run this once from the repository root. Docker caches the micromamba installation layer, so subsequent rebuilds after code changes are fast.
3
docker build -t bdb-atacseq .
4
Prepare Your Workspace
5
Ensure your config.yaml, sample sheet (data/fastp/samples.tsv), and reference data are present in the current directory. Docker mounts the current directory into the container at /app, which is the working directory the pipeline expects.
6
ls config.yaml data/fastp/samples.tsv data/reference/
7
Run the Bulk Pipeline
8
Mount the workspace and run Snakemake with 8 cores. All Snakemake flags are passed directly through the ENTRYPOINT:
9
docker run -it --rm \
  -v $(pwd):/app \
  -v /var/run/docker.sock:/var/run/docker.sock \
  bdb-atacseq --use-conda --cores 8
10
The -v /var/run/docker.sock:/var/run/docker.sock mount exposes the host Docker socket inside the container. This enables Docker-in-Docker (DinD): if any Snakemake rule uses a container: directive instead of conda:, Snakemake can spin up the required Singularity/Docker container from inside the host runner.
11
Mounting /var/run/docker.sock grants the container full access to the host Docker daemon. In shared or production environments, evaluate whether this privilege is acceptable before proceeding.
12
Run the scATAC Pipeline
13
Pass ATAC_MODE=scatac as an environment variable with -e:
14
docker run -it --rm \
  -v $(pwd):/app \
  -e ATAC_MODE=scatac \
  bdb-atacseq --use-conda --cores 8
15
Run with a Profile
16
Pass the --profile flag exactly as you would outside Docker:
17
docker run -it --rm \
  -v $(pwd):/app \
  bdb-atacseq --profile profile/local --cores 8

Common Docker Run Patterns

docker run -it --rm \
  -v $(pwd):/app \
  -v /var/run/docker.sock:/var/run/docker.sock \
  bdb-atacseq --use-conda --cores 8

Docker-in-Docker (DinD) Explained

Some Snakemake rules specify a container: URL instead of (or in addition to) a conda: environment file. When Snakemake encounters these rules, it attempts to pull and run the specified container. Mounting /var/run/docker.sock into the host runner container allows this to work transparently:
Host Docker daemon
  └── bdb-atacseq (host runner container)
        └── snakemake
              └── Pulls and runs tool containers via host Docker socket
If you do not need container-based rules (i.e., all rules in your run use conda: environments), you can omit the /var/run/docker.sock mount:
docker run -it --rm \
  -v $(pwd):/app \
  bdb-atacseq --use-conda --cores 8

When to Use Docker

macOS

Conda and Singularity have limited native macOS support. Docker Desktop provides a consistent Linux environment on both Intel and Apple Silicon Macs.

Windows

The pipeline assumes a POSIX shell. Running inside Docker via WSL 2 or Docker Desktop eliminates path and shell compatibility issues.

Restricted Linux Hosts

Environments where you cannot install system packages or where Conda is disallowed by IT policy. Docker requires only Docker itself to be installed.

Reproducibility

The image pins the Snakemake and Python versions via envs/main.yaml, producing a consistent execution environment across machines and CI systems.

Conda Cache Between Runs

By default, Snakemake creates per-rule Conda environments inside the working directory (.snakemake/conda/). Because the working directory is mounted from the host (-v $(pwd):/app), these environments persist between docker run invocations and are not re-solved on subsequent runs. If you need a clean environment cache, delete .snakemake/conda/ before rerunning.

Build docs developers (and LLMs) love