Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/BDB-Genomics/AlphaGenomeR/llms.txt

Use this file to discover all available pages before exploring further.

Every AlphaGenome prediction is anchored to a specific stretch of the genome. You provide that stretch as a single string in chr:start-end format, and the package parses it into a chromosome name, a start position, and an end position before calling the API. The model was trained on and is optimized for intervals of approximately 1 megabase (1,000,000 bp) — using windows that deviate significantly from this size will produce lower-quality or unexpected results.

Region format

The genomic_region parameter expects a string with the following structure:
"chr<N>:<start>-<end>"
ComponentDescriptionExample
chr<N>Chromosome identifierchr17, chr1, chrX
<start>0-based start position (integer)42560601
<end>End position (integer)43609177
Example region:
region <- "chr17:42560601-43609177"
This region spans chromosome 17 from position 42,560,601 to 43,609,177 — a window of 1,048,576 bp on the hg38 human genome assembly.

How the package parses your region

Internally, alphagenome_query() splits the string on : and - to extract the three components, then constructs a Python Interval object:
# From alphagenome-api.R
parts <- strsplit(genomic_region, "[:-]")[[1]]

chrom <- parts[1]           # e.g. "chr17"
start <- as.integer(parts[2])  # e.g. 42560601
end   <- as.integer(parts[3])  # e.g. 43609177

interval <- ag_genome$Interval(chromosome = chrom, start = start, end = end)
If the string does not contain at least three parts after splitting, the function stops:
Error in alphagenome_query(...) : genomic_region must be in 'chr:start-end' format.

Coordinate system

All coordinates use the hg38 (GRCh38) human genome assembly. Positions are 0-based, matching the convention used by BED files and most Python genomics libraries.
Do not mix coordinate systems. Using hg19 coordinates with hg38 annotations will produce predictions for the wrong locus without raising an error.

~1 MB window requirement

The AlphaGenome model processes DNA in 1 MB context windows. The API accepts other sizes, but predictions are most accurate when the interval is close to 1,048,576 bp (2^20).
Regions substantially shorter or longer than ~1 MB may return lower-confidence predictions or be rejected by the API. Always verify your window size before submitting a large batch of queries.
You can compute the window size in R before querying:
region <- "chr17:42560601-43609177"
parts  <- as.integer(strsplit(region, "[:-]")[[1]][-1])
window <- parts[2] - parts[1]
cat("Window size:", format(window, big.mark = ","), "bp\n")
#> Window size: 1,048,576 bp

Region examples

library(AlphaGenomeR)

api_key <- Sys.getenv("ALPHAGENOME_API_KEY")

# Chromosome 17 — BRCA1/BRCA2 locus
results_chr17 <- alphagenome_query(
  access_token    = api_key,
  genomic_region  = "chr17:42560601-43609177"
)

# Chromosome 1 — 1 MB window near the start of the chromosome
results_chr1 <- alphagenome_query(
  access_token    = api_key,
  genomic_region  = "chr1:1000000-2048576"
)

# X chromosome
results_chrX <- alphagenome_query(
  access_token    = api_key,
  genomic_region  = "chrX:50000000-51048576"
)

Supported organisms

The default organism is human (HOMO_SAPIENS). You can specify a different organism using the organism parameter, which maps to the Organism enum in the AlphaGenome Python SDK:
results <- alphagenome_query(
  access_token   = api_key,
  genomic_region = "chr17:42560601-43609177",
  organism       = "HOMO_SAPIENS"   # default; change to another supported value if needed
)
If you pass an unrecognized organism string, the package lists the available options:
Error in alphagenome_query(...) : Invalid organism. Available: HOMO_SAPIENS, ...

Build docs developers (and LLMs) love