Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/AlphaGenomeR/llms.txt

Use this file to discover all available pages before exploring further.

AlphaGenomeR is designed to fit naturally into existing Bioconductor analysis pipelines. Every extractor function returns a plain R list containing a numeric matrix ($values) and a data.frame ($metadata). These are native R structures that require no conversion before passing to standard Bioconductor packages such as GenomicRanges, edgeR, DESeq2, or visualization tools like ggplot2.

Output structure

Each extractor function — alphagenome_get_rna_seq(), alphagenome_get_atac(), and so on — returns a list with two named elements:
ElementTypeContents
$valuesmatrix (numeric)Rows = genomic positions, columns = tracks (cell types / experiments)
$metadatadata.frameOne row per track with cell type, assay, and experiment details
library(AlphaGenomeR)

api_key <- Sys.getenv("ALPHAGENOME_API_KEY")

results <- alphagenome_query(
  access_token      = api_key,
  genomic_region    = "chr17:42560601-43609177",
  ontology_terms    = c("UBERON:0002048"),
  requested_outputs = c("RNA_SEQ", "ATAC")
)

rna_data  <- alphagenome_get_rna_seq(results)
atac_data <- alphagenome_get_atac(results)

# Inspect dimensions: rows = positions, columns = tracks
dim(rna_data$values)

# Preview the first few rows of the prediction matrix
head(rna_data$values)

# Preview the track metadata
head(rna_data$metadata)

Integration examples

Plot predicted ATAC-seq signal

Use base R graphics or ggplot2 to visualize the chromatin accessibility signal along the queried interval:
atac_data <- alphagenome_get_atac(results)

plot(
  atac_data$values[, 1],
  type = "l",
  main = "Predicted ATAC-seq signal — Lung",
  xlab = "Genomic position (bp offset)",
  ylab = "Predicted accessibility"
)

Use with GenomicRanges

Attach coordinates to predicted signal values by building a GRanges object from the queried region and the position axis of $values:
library(GenomicRanges)

region_start <- 42560601L
region_end   <- 43609177L
n_positions  <- nrow(rna_data$values)

# Build per-position ranges
bp_step <- (region_end - region_start) / n_positions

gr <- GRanges(
  seqnames = "chr17",
  ranges   = IRanges(
    start = region_start + seq(0, n_positions - 1) * bp_step,
    width = bp_step
  )
)

# Attach RNA-seq track 1 as a metadata column
mcols(gr)$rna_track1 <- rna_data$values[, 1]

gr

Compare tissues with edgeR or DESeq2

When you query multiple tissues, $values contains one column per tissue track. You can pass this matrix directly to differential analysis tools:
library(edgeR)

# Query two tissues
results_multi <- alphagenome_query(
  access_token      = api_key,
  genomic_region    = "chr17:42560601-43609177",
  ontology_terms    = c("UBERON:0002048", "UBERON:0002107"),  # Lung, Liver
  requested_outputs = c("RNA_SEQ")
)

rna_multi <- alphagenome_get_rna_seq(results_multi)

# Identify which columns belong to each tissue using metadata
lung_cols  <- which(rna_multi$metadata$tissue == "UBERON:0002048")
liver_cols <- which(rna_multi$metadata$tissue == "UBERON:0002107")

# Construct a count matrix and group labels for edgeR
count_matrix <- rna_multi$values[, c(lung_cols, liver_cols)]
group        <- factor(c(rep("Lung", length(lung_cols)), rep("Liver", length(liver_cols))))

dge <- DGEList(counts = count_matrix, group = group)
AlphaGenome returns predicted signal values, not raw integer read counts. Check the edgeR or DESeq2 documentation for guidance on using continuous (non-count) inputs, or transform values as appropriate for your analysis.

Inspecting metadata

$metadata is a data.frame with one row per column in $values. Use it to map tracks back to their biological context:
# Number of tracks returned
nrow(rna_data$metadata)

# Column names vary by modality; print all to see what is available
colnames(rna_data$metadata)

# View the first few tracks
head(rna_data$metadata)
Use rna_data$metadata to subset $values by tissue, cell type, or assay before passing the matrix to downstream tools. This avoids mixing tracks from unrelated biological contexts in the same analysis.

Available extractor functions

FunctionModality
alphagenome_get_rna_seq()RNA-seq gene expression
alphagenome_get_atac()ATAC-seq chromatin accessibility
alphagenome_get_cage()CAGE transcription start sites
alphagenome_get_dnase()DNase-seq hypersensitivity
alphagenome_get_chip_tf()ChIP-seq (transcription factors)
alphagenome_get_chip_histone()ChIP-seq (histone marks)
alphagenome_get_splice_sites()Predicted splice sites
alphagenome_get_splice_junctions()Splice junction predictions
alphagenome_get_splice_usage()Splice site usage fractions
alphagenome_get_procap()PRO-cap (capped RNA)
alphagenome_get_contact_maps()3D chromatin contact maps

Build docs developers (and LLMs) love