Integrate AlphaGenomeR output with Bioconductor tools

AlphaGenomeR is designed to fit naturally into existing Bioconductor analysis pipelines. Every extractor function returns a plain R list containing a numeric matrix ($values) and a data.frame ($metadata). These are native R structures that require no conversion before passing to standard Bioconductor packages such as GenomicRanges, edgeR, DESeq2, or visualization tools like ggplot2.

Output structure

Each extractor function — alphagenome_get_rna_seq(), alphagenome_get_atac(), and so on — returns a list with two named elements:

Element	Type	Contents
`$values`	`matrix` (numeric)	Rows = genomic positions, columns = tracks (cell types / experiments)
`$metadata`	`data.frame`	One row per track with cell type, assay, and experiment details

library(AlphaGenomeR)

api_key <- Sys.getenv("ALPHAGENOME_API_KEY")

results <- alphagenome_query(
  access_token      = api_key,
  genomic_region    = "chr17:42560601-43609177",
  ontology_terms    = c("UBERON:0002048"),
  requested_outputs = c("RNA_SEQ", "ATAC")
)

rna_data  <- alphagenome_get_rna_seq(results)
atac_data <- alphagenome_get_atac(results)

# Inspect dimensions: rows = positions, columns = tracks
dim(rna_data$values)

# Preview the first few rows of the prediction matrix
head(rna_data$values)

# Preview the track metadata
head(rna_data$metadata)

Integration examples

Plot predicted ATAC-seq signal

Use base R graphics or ggplot2 to visualize the chromatin accessibility signal along the queried interval:

Base R
ggplot2

atac_data <- alphagenome_get_atac(results)

plot(
  atac_data$values[, 1],
  type = "l",
  main = "Predicted ATAC-seq signal — Lung",
  xlab = "Genomic position (bp offset)",
  ylab = "Predicted accessibility"
)

library(ggplot2)

atac_data <- alphagenome_get_atac(results)

df <- data.frame(
  position = seq_len(nrow(atac_data$values)),
  signal   = atac_data$values[, 1]
)

ggplot(df, aes(x = position, y = signal)) +
  geom_line(colour = "#4595db") +
  labs(
    title = "Predicted ATAC-seq signal — Lung",
    x     = "Genomic position (bp offset)",
    y     = "Predicted accessibility"
  ) +
  theme_minimal()

Use with GenomicRanges

Attach coordinates to predicted signal values by building a GRanges object from the queried region and the position axis of $values:

library(GenomicRanges)

region_start <- 42560601L
region_end   <- 43609177L
n_positions  <- nrow(rna_data$values)

# Build per-position ranges
bp_step <- (region_end - region_start) / n_positions

gr <- GRanges(
  seqnames = "chr17",
  ranges   = IRanges(
    start = region_start + seq(0, n_positions - 1) * bp_step,
    width = bp_step
  )
)

# Attach RNA-seq track 1 as a metadata column
mcols(gr)$rna_track1 <- rna_data$values[, 1]

gr

Compare tissues with edgeR or DESeq2

When you query multiple tissues, $values contains one column per tissue track. You can pass this matrix directly to differential analysis tools:

library(edgeR)

# Query two tissues
results_multi <- alphagenome_query(
  access_token      = api_key,
  genomic_region    = "chr17:42560601-43609177",
  ontology_terms    = c("UBERON:0002048", "UBERON:0002107"),  # Lung, Liver
  requested_outputs = c("RNA_SEQ")
)

rna_multi <- alphagenome_get_rna_seq(results_multi)

# Identify which columns belong to each tissue using metadata
lung_cols  <- which(rna_multi$metadata$tissue == "UBERON:0002048")
liver_cols <- which(rna_multi$metadata$tissue == "UBERON:0002107")

# Construct a count matrix and group labels for edgeR
count_matrix <- rna_multi$values[, c(lung_cols, liver_cols)]
group        <- factor(c(rep("Lung", length(lung_cols)), rep("Liver", length(liver_cols))))

dge <- DGEList(counts = count_matrix, group = group)

AlphaGenome returns predicted signal values, not raw integer read counts. Check the edgeR or DESeq2 documentation for guidance on using continuous (non-count) inputs, or transform values as appropriate for your analysis.

Inspecting metadata

$metadata is a data.frame with one row per column in $values. Use it to map tracks back to their biological context:

# Number of tracks returned
nrow(rna_data$metadata)

# Column names vary by modality; print all to see what is available
colnames(rna_data$metadata)

# View the first few tracks
head(rna_data$metadata)

Use rna_data$metadata to subset $values by tissue, cell type, or assay before passing the matrix to downstream tools. This avoids mixing tracks from unrelated biological contexts in the same analysis.

Available extractor functions

Function	Modality
`alphagenome_get_rna_seq()`	RNA-seq gene expression
`alphagenome_get_atac()`	ATAC-seq chromatin accessibility
`alphagenome_get_cage()`	CAGE transcription start sites
`alphagenome_get_dnase()`	DNase-seq hypersensitivity
`alphagenome_get_chip_tf()`	ChIP-seq (transcription factors)
`alphagenome_get_chip_histone()`	ChIP-seq (histone marks)
`alphagenome_get_splice_sites()`	Predicted splice sites
`alphagenome_get_splice_junctions()`	Splice junction predictions
`alphagenome_get_splice_usage()`	Splice site usage fractions
`alphagenome_get_procap()`	PRO-cap (capped RNA)
`alphagenome_get_contact_maps()`	3D chromatin contact maps

Get Started

Guides

Modalities

Integrate AlphaGenomeR output with Bioconductor tools

Output structure

Integration examples

Plot predicted ATAC-seq signal

Use with GenomicRanges

Compare tissues with edgeR or DESeq2

Inspecting metadata

Available extractor functions

Build docs developers (and LLMs) love

Get Started

Guides

Modalities

Documentation Index

​Output structure

​Integration examples

​Plot predicted ATAC-seq signal

​Use with GenomicRanges

​Compare tissues with edgeR or DESeq2

​Inspecting metadata

​Available extractor functions

Build docs developers (and LLMs) love

Output structure

Integration examples

Plot predicted ATAC-seq signal

Use with GenomicRanges

Compare tissues with edgeR or DESeq2

Inspecting metadata

Available extractor functions