The single-cell ATAC-seq (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
scatac) mode replaces the bulk alignment and peak-calling stack with a purpose-built single-cell analysis chain. Chromap aligns reads at high speed against the reference index, ArchR constructs Arrow files, removes doublets, performs iterative LSI dimensionality reduction, and clusters cells via UMAP. Cicero then maps chromatin co-accessibility networks across a configurable genomic window. All QC decisions in this mode are delegated to ArchR’s internal thresholds — the pipeline-level QC gate that runs between alignment and peak calling in bulk mode is skipped entirely.
Activating scATAC Mode
Switch to single-cell mode with theATAC_MODE environment variable or the global.mode key in config.yaml:
The environment variable takes priority over
config.yaml. If you set ATAC_MODE=scatac on the command line, it overrides any value written in global.mode.Sample Sheet Format
The scATAC pipeline reads from the same tab-separated sample sheet as bulk mode (path set byglobal.samples, default data/fastp/samples.tsv). Each row represents one library (or one barcode manifest file, depending on your experimental design). The condition and replicate columns are still parsed but are used by ArchR for pseudo-bulk grouping rather than for IDR analysis.
Pipeline Stages
Identical to bulk mode: fastp trims 5 bases from the front of both mates and enforces a minimum read length of 30 bp before FastQC generates per-sample HTML reports.
# config.yaml
fastp:
output: "results/preprocessing/fastp"
params:
trim_front1: 5
trim_front2: 5
length_required: 30
threads: 4
resources:
mem_mb: 8000
time: 120
Trimmed reads are aligned with Chromap using the
--preset atac flag, which applies ATAC-seq–specific alignment parameters. The rule spawns 16 threads and requests 32 GB of memory to handle the large barcode-aware index:# config.yaml
chromap:
input: "results/preprocessing/fastp"
output: "results/alignment/chromap"
params:
index: "data/reference/chromap/genome.index" # *CHROMAP_INDEX
preset: "atac"
threads: 16
resources:
mem_mb: 32000
time: 120
results/alignment/chromap/{sample}.bamresults/alignment/chromap/{sample}_tag.bamchromap \
--preset atac \
-x data/reference/chromap/genome.index \
-r data/reference/genome.fa \
-1 {sample}_R1_trimmed.fastq.gz \
-2 {sample}_R2_trimmed.fastq.gz \
-t 16 \
-o {sample}.bam \
--SAM
The QC gate that runs in bulk mode (FRiP, TSS enrichment, mapping rate, duplicate rate checks) is not applied in scATAC mode. ArchR enforces its own per-cell QC thresholds during Arrow file creation.
All tagged BAMs are passed together to
archr_pseudobulk, which creates ArchR Arrow files (one per sample). Cells below the minimum TSS enrichment or fragment count thresholds are excluded at this stage.# config.yaml
archr:
input:
bam: "results/alignment/chromap"
output:
arrow: "results/scatac/archr/arrow"
params:
min_tss: 4.0
min_frags: 1000
max_frags: 100000
tsse_method: "ArchR"
threads: 16
resources:
mem_mb: 64000
time: 240
min_tssmin_fragsmax_fragstsse_methodarchr_doublet_detection runs ArchR’s simulation-based doublet scoring on the raw Arrow files. Cells with a doublet enrichment score above doublet_threshold: 0.2 are removed. A diagnostic PDF is written to results/scatac/archr/doublets/doublet_enrichment.pdf and the cleaned Arrow files are saved to results/scatac/archr/filtered_arrow/.1:300.8# config.yaml (archr params excerpt)
archr:
params:
clustering_resolution: 0.8
dims_to_use: 1:30
force_dim_reduction: true
Cell Clusters
results/scatac/archr/clusters/cell_clusters.tsv — barcode-to-cluster assignment tableUMAP Plot
results/scatac/archr/plots/umap_clusters.pdf — UMAP coloured by cluster identityMarker Genes
results/scatac/archr/markers/marker_genes.tsv — per-cluster marker gene tableFull QC Report
results/scatac/archr/qc_report/ArchR_full_report.pdf — comprehensive ArchR PDF reportCicero models co-accessibility between distal regulatory elements and gene promoters using the filtered Arrow files and the cluster assignments from ArchR. The analysis runs within a 500 bp tiling window and considers peak pairs up to 250 kb apart.
# config.yaml
cicero:
input:
arrow: "results/scatac/archr/filtered_arrow"
clusters: "results/scatac/archr/clusters"
output:
connections: "results/scatac/cicero/connections"
ccans: "results/scatac/cicero/ccans"
plots: "results/scatac/cicero/plots"
params:
window_size: 500
distance_cutoff: 250000
threads: 8
resources:
mem_mb: 32000
time: 240
window_sizedistance_cutoffconnections/coaccessibility_connections.rdsconnections/coaccessibility_table.tsvccans/ccans.bedchromVAR computes per-cell motif deviation scores using the JASPAR vertebrate motif database and the Tn5-shifted signal. In scATAC mode, it operates on the Chromap-aligned BAMs alongside bigwig generation and correlation analysis.
# config.yaml
chromvar_analysis:
output:
deviations: "results/peak_calling/chromvar/deviations"
bias_corrected: "results/peak_calling/chromvar/bias_corrected"
plots: "results/peak_calling/chromvar/plots"
params:
motif_db: "data/motifs/jaspar_vertebrates.meme"
genome_fa: "data/reference/genome.fa"
genome_sizes: "data/reference/genome.chrom.sizes"
threads: 8
resources:
mem_mb: 16000
time: 240
Output Structure
All scATAC outputs are written toresults/scatac/ in addition to the shared results/visualization/ and results/peak_calling/chromvar/ directories:
Key Differences from Bulk Mode
No pipeline-level QC gate
No pipeline-level QC gate
In bulk mode,
rules/scripts/parse_qc_metrics.py applies hard thresholds for FRiP, TSS enrichment, mapping rate, and duplicate rate before peak calling. In scATAC mode this gate is skipped — the QC_GATE_TARGETS list is empty. ArchR enforces per-cell quality thresholds (min_tss, min_frags, max_frags) during Arrow file creation instead.No post-alignment filtering chain
No post-alignment filtering chain
Bulk mode runs samtools sort → fixmate → markdup → view (MAPQ ≥ 30) → blacklist removal → Tn5 shift as a sequential chain on every BAM. In scATAC mode,
POST_FILTERING_TARGETS is empty; Chromap’s --preset atac handles alignment-level filtering internally and ArchR manages cell-level QC.No IDR or DESeq2
No IDR or DESeq2
IDR replicate concordance and DESeq2 differential accessibility are bulk-only steps. In scATAC mode, cluster-level differential accessibility is performed within ArchR using its marker peak identification routines.
No TOBIAS or HINT-ATAC footprinting
No TOBIAS or HINT-ATAC footprinting
Transcription factor footprinting with TOBIAS and HINT-ATAC is not included in the scATAC rule graph. motif chromVAR provides single-cell–resolution motif deviation scores instead.