schex Hexagonal Binning Visualization

Overview

Reduced-dimension plots (UMAP, PCA, tSNE) are essential for single-cell analysis, but as dataset sizes grow, cells overlap and obscure information — even with transparency settings. schex addresses this by binning cells into hexagons and plotting a summary statistic for each bin instead of individual points. Benefits:

Eliminates overplotting in large datasets
Preserves the visual structure of the embedding
Supports plotting metadata, cluster labels, and gene expression per bin
Works seamlessly with Seurat objects

Citation: Saskia Freytag (2019). schex: Hexagonal binning for single cell data. R package.Original biology reference: Delile, Julien et al. Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord. doi: 10.1242/dev.173807Source: SaskiaFreytag/schex

Installation

remotes::install_github('SaskiaFreytag/schex')

You will also need SeuratData for the example data:

remotes::install_github('satijalab/seurat-data')

Key functions

Function	Description
`make_hexbin()`	Computes hexagon bin assignments for each cell
`plot_hexbin_density()`	Plots cell count per hexagon bin
`plot_hexbin_meta()`	Colors hexagons by a metadata variable
`plot_hexbin_gene()`	Colors hexagons by gene expression
`make_hexbin_label()`	Computes label positions for factor variables

Complete workflow

Load libraries

library(Seurat)
library(SeuratData)
library(ggplot2)
library(ggrepel)
library(schex)

theme_set(theme_classic())

Load and preprocess data

This example uses the PBMC 3k dataset:

InstallData("pbmc3k")
pbmc <- pbmc3k

Filter low-quality cells:

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc,
  subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5
)

Normalize, identify variable genes, and scale

pbmc <- NormalizeData(pbmc,
  normalization.method = "LogNormalize",
  scale.factor = 10000,
  verbose = FALSE
)
pbmc <- FindVariableFeatures(pbmc,
  selection.method = "vst",
  nfeatures = 2000,
  verbose = FALSE
)

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes, verbose = FALSE)

Dimensionality reduction and clustering

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)

Compute hexagon bin representation

make_hexbin() assigns each cell to a hexagon bin in the specified embedding. The nbins parameter controls the number of bins along the x-axis:

pbmc <- make_hexbin(pbmc, nbins = 40, dimension_reduction = "UMAP")

Choose nbins based on dataset size. More cells generally require a higher nbins value to avoid bins that are too coarse. Start with 20–40 for datasets under 10k cells; increase for larger datasets. The density plot in the next step helps you assess whether bins are evenly populated.

Plot bin density

Check how many cells fall into each hexagon. Bins should be relatively evenly populated; if one bin has far more cells than others, increase nbins:

plot_hexbin_density(pbmc)

Plot metadata in hexagon representation

Color hexagons by a metadata column. Use action to specify how to summarize the column within each bin:

# Median total count per bin
plot_hexbin_meta(pbmc, col = "nCount_RNA", action = "median")

# Majority cluster label per bin
plot_hexbin_meta(pbmc, col = "RNA_snn_res.0.5", action = "majority")

Add cluster labels with ggrepel for readability:

label_df <- make_hexbin_label(pbmc, col = "RNA_snn_res.0.5")

pp <- plot_hexbin_meta(pbmc, col = "RNA_snn_res.0.5", action = "majority")
pp + ggrepel::geom_label_repel(
  data = label_df,
  aes(x = x, y = y, label = label),
  colour = "black",
  label.size = NA,
  fill = NA
)

Plot gene expression in hexagon representation

Visualize gene expression averaged per hexagon bin:

gene_id <- "CD19"
plot_hexbin_gene(
  pbmc,
  type = "logcounts",
  gene = gene_id,
  action = "mean",
  xlab = "UMAP1",
  ylab = "UMAP2",
  title = paste0("Mean of ", gene_id)
)

`action` parameter reference

The action parameter in plot_hexbin_meta() and plot_hexbin_gene() controls how values are summarized within each bin:

Action	Use case
`"median"`	Numeric metadata (e.g., `nCount_RNA`, `percent.mt`)
`"mean"`	Gene expression values
`"majority"`	Factor/categorical metadata (e.g., cluster labels)

Choosing `nbins`

The nbins parameter in make_hexbin() specifies how many bins divide the x-axis range. Adjust it based on dataset size:

Dataset size	Suggested `nbins`
< 5,000 cells	20–30
5,000–20,000 cells	30–50
> 20,000 cells	50+

Always check plot_hexbin_density() after changing nbins to confirm bins are not over- or under-populated.

Additional resources

schex GitHub repository

Get Started

Integration Methods

Trajectory Analysis

Dimensionality Reduction

Spatial & Visualization

Quality Control & Utilities

schex Hexagonal Binning Visualization

Overview

Installation

Key functions

Complete workflow

`action` parameter reference

Choosing `nbins`

Additional resources

Build docs developers (and LLMs) love

Get Started

Integration Methods

Trajectory Analysis

Dimensionality Reduction

Spatial & Visualization

Quality Control & Utilities

Documentation Index

​Overview

​Installation

​Key functions

​Complete workflow

​action parameter reference

​Choosing nbins

​Additional resources

Build docs developers (and LLMs) love

Overview

Installation

Key functions

Complete workflow

`action` parameter reference

Choosing `nbins`

Additional resources