Documentation Index
Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/AlphaGenomeR/llms.txt
Use this file to discover all available pages before exploring further.
AlphaGenomeR is designed to fit naturally into existing Bioconductor analysis pipelines. Every extractor function returns a plain R list containing a numeric matrix ($values) and a data.frame ($metadata). These are native R structures that require no conversion before passing to standard Bioconductor packages such as GenomicRanges, edgeR, DESeq2, or visualization tools like ggplot2.
Output structure
Each extractor function — alphagenome_get_rna_seq(), alphagenome_get_atac(), and so on — returns a list with two named elements:
| Element | Type | Contents |
|---|
$values | matrix (numeric) | Rows = genomic positions, columns = tracks (cell types / experiments) |
$metadata | data.frame | One row per track with cell type, assay, and experiment details |
library(AlphaGenomeR)
api_key <- Sys.getenv("ALPHAGENOME_API_KEY")
results <- alphagenome_query(
access_token = api_key,
genomic_region = "chr17:42560601-43609177",
ontology_terms = c("UBERON:0002048"),
requested_outputs = c("RNA_SEQ", "ATAC")
)
rna_data <- alphagenome_get_rna_seq(results)
atac_data <- alphagenome_get_atac(results)
# Inspect dimensions: rows = positions, columns = tracks
dim(rna_data$values)
# Preview the first few rows of the prediction matrix
head(rna_data$values)
# Preview the track metadata
head(rna_data$metadata)
Integration examples
Plot predicted ATAC-seq signal
Use base R graphics or ggplot2 to visualize the chromatin accessibility signal along the queried interval:
atac_data <- alphagenome_get_atac(results)
plot(
atac_data$values[, 1],
type = "l",
main = "Predicted ATAC-seq signal — Lung",
xlab = "Genomic position (bp offset)",
ylab = "Predicted accessibility"
)
library(ggplot2)
atac_data <- alphagenome_get_atac(results)
df <- data.frame(
position = seq_len(nrow(atac_data$values)),
signal = atac_data$values[, 1]
)
ggplot(df, aes(x = position, y = signal)) +
geom_line(colour = "#4595db") +
labs(
title = "Predicted ATAC-seq signal — Lung",
x = "Genomic position (bp offset)",
y = "Predicted accessibility"
) +
theme_minimal()
Use with GenomicRanges
Attach coordinates to predicted signal values by building a GRanges object from the queried region and the position axis of $values:
library(GenomicRanges)
region_start <- 42560601L
region_end <- 43609177L
n_positions <- nrow(rna_data$values)
# Build per-position ranges
bp_step <- (region_end - region_start) / n_positions
gr <- GRanges(
seqnames = "chr17",
ranges = IRanges(
start = region_start + seq(0, n_positions - 1) * bp_step,
width = bp_step
)
)
# Attach RNA-seq track 1 as a metadata column
mcols(gr)$rna_track1 <- rna_data$values[, 1]
gr
Compare tissues with edgeR or DESeq2
When you query multiple tissues, $values contains one column per tissue track. You can pass this matrix directly to differential analysis tools:
library(edgeR)
# Query two tissues
results_multi <- alphagenome_query(
access_token = api_key,
genomic_region = "chr17:42560601-43609177",
ontology_terms = c("UBERON:0002048", "UBERON:0002107"), # Lung, Liver
requested_outputs = c("RNA_SEQ")
)
rna_multi <- alphagenome_get_rna_seq(results_multi)
# Identify which columns belong to each tissue using metadata
lung_cols <- which(rna_multi$metadata$tissue == "UBERON:0002048")
liver_cols <- which(rna_multi$metadata$tissue == "UBERON:0002107")
# Construct a count matrix and group labels for edgeR
count_matrix <- rna_multi$values[, c(lung_cols, liver_cols)]
group <- factor(c(rep("Lung", length(lung_cols)), rep("Liver", length(liver_cols))))
dge <- DGEList(counts = count_matrix, group = group)
AlphaGenome returns predicted signal values, not raw integer read counts. Check the edgeR or DESeq2 documentation for guidance on using continuous (non-count) inputs, or transform values as appropriate for your analysis.
$metadata is a data.frame with one row per column in $values. Use it to map tracks back to their biological context:
# Number of tracks returned
nrow(rna_data$metadata)
# Column names vary by modality; print all to see what is available
colnames(rna_data$metadata)
# View the first few tracks
head(rna_data$metadata)
Use rna_data$metadata to subset $values by tissue, cell type, or assay before passing the matrix to downstream tools. This avoids mixing tracks from unrelated biological contexts in the same analysis.
| Function | Modality |
|---|
alphagenome_get_rna_seq() | RNA-seq gene expression |
alphagenome_get_atac() | ATAC-seq chromatin accessibility |
alphagenome_get_cage() | CAGE transcription start sites |
alphagenome_get_dnase() | DNase-seq hypersensitivity |
alphagenome_get_chip_tf() | ChIP-seq (transcription factors) |
alphagenome_get_chip_histone() | ChIP-seq (histone marks) |
alphagenome_get_splice_sites() | Predicted splice sites |
alphagenome_get_splice_junctions() | Splice junction predictions |
alphagenome_get_splice_usage() | Splice site usage fractions |
alphagenome_get_procap() | PRO-cap (capped RNA) |
alphagenome_get_contact_maps() | 3D chromatin contact maps |