Format and validate genomic regions for AlphaGenomeR

Every AlphaGenome prediction is anchored to a specific stretch of the genome. You provide that stretch as a single string in chr:start-end format, and the package parses it into a chromosome name, a start position, and an end position before calling the API. The model was trained on and is optimized for intervals of approximately 1 megabase (1,000,000 bp) — using windows that deviate significantly from this size will produce lower-quality or unexpected results.

Region format

The genomic_region parameter expects a string with the following structure:

"chr<N>:<start>-<end>"

Component	Description	Example
`chr<N>`	Chromosome identifier	`chr17`, `chr1`, `chrX`
`<start>`	0-based start position (integer)	`42560601`
`<end>`	End position (integer)	`43609177`

Example region:

region <- "chr17:42560601-43609177"

This region spans chromosome 17 from position 42,560,601 to 43,609,177 — a window of 1,048,576 bp on the hg38 human genome assembly.

How the package parses your region

Internally, alphagenome_query() splits the string on : and - to extract the three components, then constructs a Python Interval object:

# From alphagenome-api.R
parts <- strsplit(genomic_region, "[:-]")[[1]]

chrom <- parts[1]           # e.g. "chr17"
start <- as.integer(parts[2])  # e.g. 42560601
end   <- as.integer(parts[3])  # e.g. 43609177

interval <- ag_genome$Interval(chromosome = chrom, start = start, end = end)

If the string does not contain at least three parts after splitting, the function stops:

Error in alphagenome_query(...) : genomic_region must be in 'chr:start-end' format.

Coordinate system

All coordinates use the hg38 (GRCh38) human genome assembly. Positions are 0-based, matching the convention used by BED files and most Python genomics libraries.

Do not mix coordinate systems. Using hg19 coordinates with hg38 annotations will produce predictions for the wrong locus without raising an error.

~1 MB window requirement

The AlphaGenome model processes DNA in 1 MB context windows. The API accepts other sizes, but predictions are most accurate when the interval is close to 1,048,576 bp (2^20).

Regions substantially shorter or longer than ~1 MB may return lower-confidence predictions or be rejected by the API. Always verify your window size before submitting a large batch of queries.

You can compute the window size in R before querying:

region <- "chr17:42560601-43609177"
parts  <- as.integer(strsplit(region, "[:-]")[[1]][-1])
window <- parts[2] - parts[1]
cat("Window size:", format(window, big.mark = ","), "bp\n")
#> Window size: 1,048,576 bp

Region examples

library(AlphaGenomeR)

api_key <- Sys.getenv("ALPHAGENOME_API_KEY")

# Chromosome 17 — BRCA1/BRCA2 locus
results_chr17 <- alphagenome_query(
  access_token    = api_key,
  genomic_region  = "chr17:42560601-43609177"
)

# Chromosome 1 — 1 MB window near the start of the chromosome
results_chr1 <- alphagenome_query(
  access_token    = api_key,
  genomic_region  = "chr1:1000000-2048576"
)

# X chromosome
results_chrX <- alphagenome_query(
  access_token    = api_key,
  genomic_region  = "chrX:50000000-51048576"
)

Supported organisms

The default organism is human (HOMO_SAPIENS). You can specify a different organism using the organism parameter, which maps to the Organism enum in the AlphaGenome Python SDK:

results <- alphagenome_query(
  access_token   = api_key,
  genomic_region = "chr17:42560601-43609177",
  organism       = "HOMO_SAPIENS"   # default; change to another supported value if needed
)

If you pass an unrecognized organism string, the package lists the available options:

Error in alphagenome_query(...) : Invalid organism. Available: HOMO_SAPIENS, ...

Get Started

Guides

Modalities

Format and validate genomic regions for AlphaGenomeR

Region format

How the package parses your region

Coordinate system

~1 MB window requirement

Region examples

Supported organisms

Build docs developers (and LLMs) love

Get Started

Guides

Modalities

Documentation Index

​Region format

​How the package parses your region

​Coordinate system

​~1 MB window requirement

​Region examples

​Supported organisms

Build docs developers (and LLMs) love

Region format

How the package parses your region

Coordinate system

~1 MB window requirement

Region examples

Supported organisms