LIGER Integration

LIGER (Linked Inference of Genomic Experimental Relationships) uses integrative non-negative matrix factorization (iNMF) to identify shared and dataset-specific factors across multiple single-cell datasets. SeuratWrappers provides RunOptimizeALS() and RunQuantileNorm() to run LIGER directly on Seurat objects.

LIGER does not center data during scaling. You must pass do.center = FALSE to ScaleData() before running LIGER, and use split.by to scale each dataset subset independently.

Update your rliger package to version 0.5.0 or above before following this workflow. Install it from CRAN: install.packages('rliger').

Citation

If you use LIGER in your work, please cite:

Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity Joshua Welch, Velina Kozareva, Ashley Ferreira, Charles Vanderburg, Carly Martin, Evan Z. Macosko Cell, 2019 doi: 10.1016/j.cell.2019.05.006 GitHub: https://github.com/welch-lab/liger

Installation

# Install rliger from CRAN
install.packages('rliger')

# Install SeuratWrappers
remotes::install_github('satijalab/seurat-wrappers')

Workflow

Load libraries and data

library(rliger)
library(Seurat)
library(SeuratData)
library(SeuratWrappers)

InstallData("pbmcsca")
data("pbmcsca")

Normalize and identify variable features

pbmcsca <- NormalizeData(pbmcsca)
pbmcsca <- FindVariableFeatures(pbmcsca)

Scale data without centering

LIGER requires uncentered scaled data. Use split.by to scale each dataset subset separately.

pbmcsca <- ScaleData(pbmcsca, split.by = "Method", do.center = FALSE)

Run iNMF factorization with RunOptimizeALS

Factorizes the scaled data using alternating least squares (ALS). The result is stored as the iNMF_raw reduction.

pbmcsca <- RunOptimizeALS(pbmcsca, k = 20, lambda = 5, split.by = "Method")

Quantile normalize the joint embeddings

Aligns factor loadings across datasets via quantile normalization, producing the final integrated iNMF reduction.

pbmcsca <- RunQuantileNorm(pbmcsca, split.by = "Method")

Cluster and visualize

Optionally run Louvain clustering on the integrated embedding, then compute UMAP.

pbmcsca <- FindNeighbors(pbmcsca, reduction = "iNMF", dims = 1:20)
pbmcsca <- FindClusters(pbmcsca, resolution = 0.3)
pbmcsca <- RunUMAP(pbmcsca, dims = 1:ncol(pbmcsca[["iNMF"]]), reduction = "iNMF")
DimPlot(pbmcsca, group.by = c("Method", "ident", "CellType"), ncol = 3)

Examples

Interferon-stimulated and control PBMC

InstallData("ifnb")
data("ifnb")
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb, split.by = "stim", do.center = FALSE)
ifnb <- RunOptimizeALS(ifnb, k = 20, lambda = 5, split.by = "stim")
ifnb <- RunQuantileNorm(ifnb, split.by = "stim")
ifnb <- FindNeighbors(ifnb, reduction = "iNMF", dims = 1:20)
ifnb <- FindClusters(ifnb, resolution = 0.55)
ifnb <- RunUMAP(ifnb, dims = 1:ncol(ifnb[["iNMF"]]), reduction = "iNMF")
DimPlot(ifnb, group.by = c("stim", "ident", "seurat_annotations"), ncol = 3)

Eight human pancreatic islet datasets

InstallData("panc8")
data("panc8")
panc8 <- NormalizeData(panc8)
panc8 <- FindVariableFeatures(panc8)
panc8 <- ScaleData(panc8, split.by = "replicate", do.center = FALSE)
panc8 <- RunOptimizeALS(panc8, k = 20, lambda = 5, split.by = "replicate")
panc8 <- RunQuantileNorm(panc8, split.by = "replicate")
panc8 <- FindNeighbors(panc8, reduction = "iNMF", dims = 1:20)
panc8 <- FindClusters(panc8, resolution = 0.4)
panc8 <- RunUMAP(panc8, dims = 1:ncol(panc8[["iNMF"]]), reduction = "iNMF")
DimPlot(panc8, group.by = c("replicate", "ident", "celltype"), ncol = 3)

Functions

RunOptimizeALS

Runs iNMF factorization via alternating least squares on a merged Seurat object. Stores per-dataset factor loading matrices in the tool slot (accessible with Tool()), and combined cell embeddings in the iNMF_raw reduction by default.

object

Seurat

required

A merged Seurat object. Data must be scaled (without centering) before calling this function.

integer

required

Number of factors (latent dimensions) to compute.

split.by

character

default:"orig.ident"

Metadata column used to split cells into per-dataset subsets for factorization.

lambda

numeric

default:"5"

Regularization parameter. Controls the weight of the dataset-specific penalty term. Higher values enforce greater similarity between shared and dataset-specific factors.

thresh

numeric

default:"1e-6"

Convergence threshold. Optimization stops when the objective improvement falls below this value.

max.iters

integer

default:"30"

Maximum number of ALS iterations.

nrep

integer

default:"1"

Number of factorization restarts. The run with the lowest final objective is retained.

rand.seed

integer

default:"1"

Random seed for reproducibility.

reduction.name

character

default:"iNMF_raw"

Name under which the raw iNMF embedding is stored.

reduction.key

character

default:"riNMF_"

Key prefix for the raw iNMF reduction dimensions.

RunQuantileNorm

Aligns iNMF factor loadings across datasets using quantile normalization. Produces the final integrated embedding stored in the iNMF reduction by default. Also assigns cluster identities to cells via Idents().

object

Seurat

required

A Seurat object after running RunOptimizeALS().

split.by

character

default:"orig.ident"

Metadata column identifying which dataset each cell belongs to.

reduction

character

default:"iNMF_raw"

Name of the raw iNMF reduction to normalize.

quantiles

integer

default:"50"

Number of quantile bins used in the normalization procedure.

ref_dataset

character

default:"NULL"

Name of the reference dataset for alignment. Defaults to the largest dataset.

min_cells

integer

default:"20"

Minimum number of cells required in a cluster for it to be used in alignment.

knn_k

integer

default:"20"

Number of nearest neighbors used in the kNN graph for quantile normalization.

do.center

logical

default:"FALSE"

Whether to center embeddings before normalization. Should match the centering used in ScaleData().

max_sample

integer

default:"1000"

Maximum number of cells to sample per dataset when computing quantiles.

eps

numeric

default:"0.9"

Epsilon parameter for approximate nearest neighbor search.

refine.knn

logical

default:"TRUE"

Whether to refine the kNN graph after initial construction.

reduction.name

character

default:"iNMF"

Name under which the normalized embedding is stored.

reduction.key

character

default:"iNMF_"

Key prefix for the normalized iNMF reduction dimensions.

Deprecated Functions

RunSNF() and RunQuantileAlignSNF() are deprecated. Both now redirect to RunQuantileNorm(). Use RunQuantileNorm() directly in all new workflows.

Get Started

Integration Methods

Trajectory Analysis

Dimensionality Reduction

Spatial & Visualization

Quality Control & Utilities

Citation

Installation

Workflow

Examples

Interferon-stimulated and control PBMC

Eight human pancreatic islet datasets

Functions

RunOptimizeALS

RunQuantileNorm

Deprecated Functions

Build docs developers (and LLMs) love

Get Started

Integration Methods

Trajectory Analysis

Dimensionality Reduction

Spatial & Visualization

Quality Control & Utilities

Documentation Index

​Citation

​Installation

​Workflow

​Examples

​Interferon-stimulated and control PBMC

​Eight human pancreatic islet datasets

​Functions

​RunOptimizeALS

​RunQuantileNorm

​Deprecated Functions

Build docs developers (and LLMs) love

Citation

Installation

Workflow

Examples

Interferon-stimulated and control PBMC

Eight human pancreatic islet datasets

Functions

RunOptimizeALS

RunQuantileNorm

Deprecated Functions