miQC Quality Control

miQC jointly models mitochondrial read percentage and library complexity using a two-distribution mixture model, enabling probabilistic rather than threshold-based identification of compromised cells. This is particularly useful for archived or tumor tissues where fixed mitochondrial cutoffs are often too stringent or too lenient.

Citation: Hippen et al. (2021) miQC: An adaptive probabilistic framework for quality control of single-cell RNA-sequencing data. bioRxiv. doi: 10.1101/2021.03.03.433798Source: greenelab/miQC (Bioconductor)

Installation

# Install miQC from Bioconductor
BiocManager::install('miQC')

# Install flexmix (required dependency)
install.packages('flexmix')

At this point, the miQC algorithm has been adapted for use in Seurat through installation of flexmix only. The miQC Bioconductor package provides the reference implementation, but SeuratWrappers calls flexmix directly.

Key Functions

RunMiQC() — Fits a two-distribution mixture model and assigns each cell a probability of being compromised. Stores results in object metadata.
PlotMiQC() — Visualizes the fitted mixture model overlaid on a scatter plot of mitochondrial percentage vs. unique gene count.

How It Works

miQC assumes that a scRNA-seq dataset contains two populations of cells: intact cells (low mitochondrial reads, higher gene counts) and compromised cells (high mitochondrial reads, lower gene counts). It fits a two-component mixture model to the joint distribution of percent.mt and nFeature_RNA, then computes a posterior probability for each cell of belonging to the compromised component. Cells above a configurable posterior.cutoff are labeled for removal. This approach adapts to each dataset’s specific quality profile rather than requiring a universal threshold.

Workflow

Load libraries and data

library(Seurat)
library(SeuratData)
library(SeuratWrappers)
library(flexmix)

InstallData("pbmc3k")
data("pbmc3k")

Calculate mitochondrial percentage

miQC requires percent.mt and nFeature_RNA to be present in the object metadata. nFeature_RNA is computed automatically by CreateSeuratObject. Calculate percent.mt with PercentageFeatureSet.For human data, mitochondrial genes start with MT-. For mouse data, use mt-.

pbmc3k[["percent.mt"]] <- PercentageFeatureSet(object = pbmc3k, pattern = "^MT-")

Inspect the distribution before running the model:

FeatureScatter(pbmc3k, feature1 = "nFeature_RNA", feature2 = "percent.mt")

Look for a distinctive triangular shape: a wide range of mitochondrial percentages at lower gene counts tapering to low mitochondrial percentage at higher gene counts. If this pattern is absent, the two-distribution assumption may not hold for your data.

Run the miQC mixture model

pbmc3k <- RunMiQC(
  pbmc3k,
  percent.mt = "percent.mt",
  nFeature_RNA = "nFeature_RNA",
  posterior.cutoff = 0.75,
  model.slot = "flexmix_model"
)

After running, two new metadata columns are added:

miQC.probability — posterior probability of belonging to the compromised distribution
miQC.keep — "keep" or "discard" decision based on posterior.cutoff

Visualize the model

Plot the mixture model with cells colored by their compromise probability:

PlotMiQC(pbmc3k, color.by = "miQC.probability") +
  ggplot2::scale_color_gradient(low = "grey", high = "purple")

Or visualize the keep/discard decisions directly:

PlotMiQC(pbmc3k, color.by = "miQC.keep")

Filter cells

Subset the Seurat object to retain only high-quality cells:

pbmc3k_filtered <- subset(pbmc3k, miQC.keep == "keep")
pbmc3k_filtered
# An object of class Seurat
# 13714 features across 2593 samples within 1 assay

RunMiQC Parameters

object

Seurat object

required

The Seurat object to run miQC on.

percent.mt

character

default:"percent.mt"

Name of the metadata column containing the percentage of reads attributed to mitochondrial genes.

nFeature_RNA

character

default:"nFeature_RNA"

Name of the metadata column containing the number of unique genes detected per cell.

posterior.cutoff

numeric

default:"0.75"

Posterior probability threshold for the compromised distribution. Cells with probability above this value are marked as "discard". Must be between 0 and 1. When processing multiple samples for the same experiment, use the same cutoff across all samples for consistency.

model.type

character

default:"linear"

Type of mixture model to fit. Options:

"linear" — linear mixture model (recommended)
"spline" — b-spline mixture model
"polynomial" — two-degree polynomial mixture model

model.slot

character

default:"flexmix_model"

Name of the misc slot in the Seurat object where the fitted flexmix model is stored.

backup.option

character

default:"percentile"

Fallback strategy when flexmix fails to fit a two-cluster model. Options:

"percentile" — filter by backup.percentile of the mitochondrial distribution
"percent" — filter by a fixed backup.percent mitochondrial cutoff
"pass" — return the object unchanged without miQC stats
"halt" — stop with an error

backup.percentile

numeric

default:"0.99"

Percentile cutoff for mitochondrial percentage when backup.option = "percentile".

backup.percent

numeric

default:"5"

Fixed mitochondrial percentage cutoff when backup.option = "percent".

verbose

logical

default:"TRUE"

Whether to print progress messages.

PlotMiQC Parameters

seurat_object

Seurat object

required

A Seurat object that has already been processed with RunMiQC.

percent.mt

character

default:"percent.mt"

Name of the metadata column with mitochondrial percentage.

nFeature_RNA

character

default:"nFeature_RNA"

Name of the metadata column with unique gene counts.

model.slot

character

default:"flexmix_model"

The misc slot where the flexmix model was stored during RunMiQC.

color.by

character

default:"miQC.probability"

Metadata column to use for coloring points. Common choices are "miQC.probability" (continuous gradient) and "miQC.keep" (categorical).

Non-linear Models

For datasets where a linear relationship between mitochondrial percentage and gene count does not hold, RunMiQC supports b-spline and polynomial models via the model.type parameter:

# B-spline model
pbmc3k <- RunMiQC(
  pbmc3k,
  percent.mt = "percent.mt",
  nFeature_RNA = "nFeature_RNA",
  posterior.cutoff = 0.75,
  model.slot = "flexmix_model",
  model.type = "spline"
)
PlotMiQC(pbmc3k, color.by = "miQC.keep")

All visualization and filtering functions work identically regardless of model type.

Handling Model Failures

Some datasets — particularly very clean ones — do not have a meaningful second population of compromised cells, so flexmix may fail to find two clusters. RunMiQC will issue a warning and fall back to the strategy set by backup.option:

pbmc3k_extreme <- RunMiQC(
  pbmc3k_extreme,
  percent.mt = "percent.mt",
  nFeature_RNA = "nFeature_RNA",
  posterior.cutoff = 0.9,
  model.slot = "flexmix_model",
  backup.option = "percentile",
  backup.percentile = 0.95
)

Before running miQC, use FeatureScatter to check whether the two-distribution assumption is appropriate for your data.

Inspecting Model Parameters

The raw flexmix model is stored in the object’s misc slot and can be accessed directly:

# View mixture model parameters
flexmix::parameters(Misc(pbmc3k, "flexmix_model"))

# View posterior probabilities for the first few cells
head(flexmix::posterior(Misc(pbmc3k, "flexmix_model")))

Get Started

Integration Methods

Trajectory Analysis

Dimensionality Reduction

Spatial & Visualization

Quality Control & Utilities

Installation

Key Functions

How It Works

Workflow

RunMiQC Parameters

PlotMiQC Parameters

Non-linear Models

Handling Model Failures

Inspecting Model Parameters

Build docs developers (and LLMs) love

Get Started

Integration Methods

Trajectory Analysis

Dimensionality Reduction

Spatial & Visualization

Quality Control & Utilities

Documentation Index

​Installation

​Key Functions

​How It Works

​Workflow

​RunMiQC Parameters

​PlotMiQC Parameters

​Non-linear Models

​Handling Model Failures

​Inspecting Model Parameters

Build docs developers (and LLMs) love

Installation

Key Functions

How It Works

Workflow

RunMiQC Parameters

PlotMiQC Parameters

Non-linear Models

Handling Model Failures

Inspecting Model Parameters