Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/satijalab/seurat-wrappers/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Reduced-dimension plots (UMAP, PCA, tSNE) are essential for single-cell analysis, but as dataset sizes grow, cells overlap and obscure information — even with transparency settings. schex addresses this by binning cells into hexagons and plotting a summary statistic for each bin instead of individual points. Benefits:
  • Eliminates overplotting in large datasets
  • Preserves the visual structure of the embedding
  • Supports plotting metadata, cluster labels, and gene expression per bin
  • Works seamlessly with Seurat objects
Citation: Saskia Freytag (2019). schex: Hexagonal binning for single cell data. R package.Original biology reference: Delile, Julien et al. Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord. doi: 10.1242/dev.173807Source: SaskiaFreytag/schex

Installation

remotes::install_github('SaskiaFreytag/schex')
You will also need SeuratData for the example data:
remotes::install_github('satijalab/seurat-data')

Key functions

FunctionDescription
make_hexbin()Computes hexagon bin assignments for each cell
plot_hexbin_density()Plots cell count per hexagon bin
plot_hexbin_meta()Colors hexagons by a metadata variable
plot_hexbin_gene()Colors hexagons by gene expression
make_hexbin_label()Computes label positions for factor variables

Complete workflow

1

Load libraries

library(Seurat)
library(SeuratData)
library(ggplot2)
library(ggrepel)
library(schex)

theme_set(theme_classic())
2

Load and preprocess data

This example uses the PBMC 3k dataset:
InstallData("pbmc3k")
pbmc <- pbmc3k
Filter low-quality cells:
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc,
  subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5
)
3

Normalize, identify variable genes, and scale

pbmc <- NormalizeData(pbmc,
  normalization.method = "LogNormalize",
  scale.factor = 10000,
  verbose = FALSE
)
pbmc <- FindVariableFeatures(pbmc,
  selection.method = "vst",
  nfeatures = 2000,
  verbose = FALSE
)

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes, verbose = FALSE)
4

Dimensionality reduction and clustering

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE)
pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE)
5

Compute hexagon bin representation

make_hexbin() assigns each cell to a hexagon bin in the specified embedding. The nbins parameter controls the number of bins along the x-axis:
pbmc <- make_hexbin(pbmc, nbins = 40, dimension_reduction = "UMAP")
Choose nbins based on dataset size. More cells generally require a higher nbins value to avoid bins that are too coarse. Start with 20–40 for datasets under 10k cells; increase for larger datasets. The density plot in the next step helps you assess whether bins are evenly populated.
6

Plot bin density

Check how many cells fall into each hexagon. Bins should be relatively evenly populated; if one bin has far more cells than others, increase nbins:
plot_hexbin_density(pbmc)
7

Plot metadata in hexagon representation

Color hexagons by a metadata column. Use action to specify how to summarize the column within each bin:
# Median total count per bin
plot_hexbin_meta(pbmc, col = "nCount_RNA", action = "median")

# Majority cluster label per bin
plot_hexbin_meta(pbmc, col = "RNA_snn_res.0.5", action = "majority")
Add cluster labels with ggrepel for readability:
label_df <- make_hexbin_label(pbmc, col = "RNA_snn_res.0.5")

pp <- plot_hexbin_meta(pbmc, col = "RNA_snn_res.0.5", action = "majority")
pp + ggrepel::geom_label_repel(
  data = label_df,
  aes(x = x, y = y, label = label),
  colour = "black",
  label.size = NA,
  fill = NA
)
8

Plot gene expression in hexagon representation

Visualize gene expression averaged per hexagon bin:
gene_id <- "CD19"
plot_hexbin_gene(
  pbmc,
  type = "logcounts",
  gene = gene_id,
  action = "mean",
  xlab = "UMAP1",
  ylab = "UMAP2",
  title = paste0("Mean of ", gene_id)
)

action parameter reference

The action parameter in plot_hexbin_meta() and plot_hexbin_gene() controls how values are summarized within each bin:
ActionUse case
"median"Numeric metadata (e.g., nCount_RNA, percent.mt)
"mean"Gene expression values
"majority"Factor/categorical metadata (e.g., cluster labels)

Choosing nbins

The nbins parameter in make_hexbin() specifies how many bins divide the x-axis range. Adjust it based on dataset size:
Dataset sizeSuggested nbins
< 5,000 cells20–30
5,000–20,000 cells30–50
> 20,000 cells50+
Always check plot_hexbin_density() after changing nbins to confirm bins are not over- or under-populated.

Additional resources

Build docs developers (and LLMs) love