Raw ATAC-seq paired-end reads carry two sources of contamination that must be removed before alignment: residual Nextera adapter sequences introduced during library preparation, and the well-characterized Tn5 insertion bias at the extreme 5′ end of each read. The pipeline handles both in a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/atacseq-pipeline/llms.txt
Use this file to discover all available pages before exploring further.
fastp pass per sample, then immediately feeds the trimmed FASTQs into FastQC to produce per-base-quality and adapter-content reports. Both steps are fully parallelized and containerized, and every output path is resolved from config.yaml so no rule file ever needs editing.
Why 5′ Bases Are Trimmed in ATAC-seq
During ATAC-seq library construction, the hyperactive Tn5 transposase cuts accessible chromatin and simultaneously ligates sequencing adapters. The nucleotides at the very 5′ end of each read are therefore directly adjacent to the Tn5 insertion site, and their base composition is dominated by the enzyme’s own sequence preference rather than by the underlying genomic sequence. Retaining these bases introduces a systematic GC and sequence bias that inflates false-positive peak calls near Tn5-preferred motifs. Trimming five bases from the 5′ end of both R1 (trim_front1: 5) and R2 (trim_front2: 5) removes this bias at its source before any reads are aligned.
Configuration
Raw FASTQ paths are not declared under
fastp.input in config.yaml. They are resolved dynamically at runtime from the sample sheet (global.samples), so the config stays sample-agnostic.fastp Trimming
Adapter Detection
fastp automatically detects Nextera/Illumina adapter sequences in paired-end mode (
--detect_adapter_for_pe). No adapter sequence needs to be provided manually.5′ Bias Removal
Five bases are hard-clipped from the 5′ end of R1 (
--trim_front1 5) and R2 (--trim_front2 5) to eliminate Tn5 insertion-site bias before alignment.Length Filtering
Any read shorter than 30 bp after trimming is discarded (
--length_required 30). This prevents very short fragments — which align ambiguously — from polluting the BAM.Snakemake Rule
Output Files
| File | Description |
|---|---|
{sample}_R1_trimmed.fastq.gz | Trimmed R1 reads; input to Bowtie2 |
{sample}_R2_trimmed.fastq.gz | Trimmed R2 reads; input to Bowtie2 |
{sample}.html | Interactive fastp quality report |
{sample}.json | Machine-readable summary for MultiQC |
results/preprocessing/fastp/.
FastQC Quality Reports
FastQC runs immediately after fastp and takes the trimmed FASTQs as input — not the raw reads. This means the reports reflect the actual data that will be aligned, giving accurate adapter-content and per-base-quality statistics.Output Files
| File | Description |
|---|---|
{sample}_R1_trimmed_fastqc.html | R1 interactive quality report |
{sample}_R1_trimmed_fastqc.zip | R1 data archive consumed by MultiQC |
{sample}_R2_trimmed_fastqc.html | R2 interactive quality report |
{sample}_R2_trimmed_fastqc.zip | R2 data archive consumed by MultiQC |
results/preprocessing/fastqc/.
Resource Scaling
Both rules use adaptive memory allocation: if the input FASTQ is larger than the config floor, the rule requests 1.5× the input size in RAM. On retry (Snakemake’sattempt variable), both memory and wall-time scale linearly.
Container Support
Both rules ship with both Conda environment definitions and Singularity container URIs:--use-singularity to Snakemake instead of --use-conda to run entirely inside containers — no local tool installation required.