Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/AlphaGenomeR/llms.txt

Use this file to discover all available pages before exploring further.

AlphaGenome predicts three complementary transcription-related signals from DNA sequence: steady-state gene expression (RNA-seq), capped transcript initiation sites (CAGE), and nascent capped RNA at active promoters (PRO-cap). You can request any combination of these in a single alphagenome_query() call and extract each with its dedicated function.

RNA-seq gene expression

RNA-seq signal represents the predicted steady-state abundance of RNA across the queried genomic interval. The model produces a track per cell type or tissue, allowing comparison of expression levels across biological contexts. Requested output token: "RNA_SEQ"
Extractor function: alphagenome_get_rna_seq(response_body)
Returns: list($values, $metadata) — a positions × tracks numeric matrix and a track annotation data frame.
library(AlphaGenomeR)

api_key <- Sys.getenv("ALPHAGENOME_API_KEY")
region  <- "chr17:42560601-43609177"

results <- alphagenome_query(
  access_token     = api_key,
  genomic_region   = region,
  ontology_terms   = c("UBERON:0002048"),  # Lung
  requested_outputs = c("RNA_SEQ")
)

rna_data <- alphagenome_get_rna_seq(results)

# Prediction matrix: positions x tracks
dim(rna_data$values)

# Track metadata: cell types, experimental details
head(rna_data$metadata)

# Plot expression signal for the first track
plot(rna_data$values[, 1], type = "l",
     xlab = "Genomic position", ylab = "Predicted RNA-seq signal",
     main = "RNA-seq prediction")
The $metadata data frame maps each column in $values to a specific cell type or tissue. Use it to identify which tracks correspond to your biological context of interest before subsetting the matrix.

CAGE transcription start sites

CAGE (Cap Analysis of Gene Expression) measures the 5’ ends of capped RNA molecules, pinpointing the exact nucleotide positions where transcription initiates. Predicted CAGE signal highlights active transcription start sites (TSSs) at base-pair resolution. Requested output token: "CAGE"
Extractor function: alphagenome_get_cage(response_body)
Returns: list($values, $metadata) — a positions × tracks numeric matrix of TSS signal.
results <- alphagenome_query(
  access_token     = api_key,
  genomic_region   = region,
  ontology_terms   = c("UBERON:0002048"),
  requested_outputs = c("CAGE")
)

cage_data <- alphagenome_get_cage(results)

# Identify positions with strong TSS signal
top_tss <- which(cage_data$values[, 1] > quantile(cage_data$values[, 1], 0.99))
cat("Top TSS positions:", head(top_tss), "\n")

PRO-cap nascent capped RNA

PRO-cap (Precision Run-On and capping) captures nascent RNA at the moment of active transcription, providing a real-time view of promoter engagement. Unlike RNA-seq, which reflects steady-state levels, PRO-cap signal marks promoters that are currently firing. Requested output token: "PROCAP"
Extractor function: alphagenome_get_procap(response_body)
Returns: list($values, $metadata) — a positions × tracks numeric matrix of nascent capped RNA signal.
results <- alphagenome_query(
  access_token     = api_key,
  genomic_region   = region,
  ontology_terms   = c("UBERON:0002048"),
  requested_outputs = c("PROCAP")
)

procap_data <- alphagenome_get_procap(results)
dim(procap_data$values)

Querying all three together

Request RNA-seq, CAGE, and PRO-cap in a single call to build a complete picture of transcriptional activity across your locus of interest.
results <- alphagenome_query(
  access_token     = api_key,
  genomic_region   = region,
  ontology_terms   = c("UBERON:0002048"),  # Lung
  requested_outputs = c("RNA_SEQ", "CAGE", "PROCAP")
)

rna_data    <- alphagenome_get_rna_seq(results)
cage_data   <- alphagenome_get_cage(results)
procap_data <- alphagenome_get_procap(results)

# Compare steady-state vs. nascent transcription at each position
par(mfrow = c(3, 1), mar = c(2, 4, 1, 1))
plot(rna_data$values[, 1],    type = "l", ylab = "RNA-seq")
plot(cage_data$values[, 1],   type = "l", ylab = "CAGE")
plot(procap_data$values[, 1], type = "l", ylab = "PRO-cap")
If a modality token is not included in requested_outputs, its extractor returns NULL. Check before accessing $values or $metadata when the request is conditional.

Build docs developers (and LLMs) love