Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/satijalab/seurat-wrappers/llms.txt

Use this file to discover all available pages before exploring further.

miQC jointly models mitochondrial read percentage and library complexity using a two-distribution mixture model, enabling probabilistic rather than threshold-based identification of compromised cells. This is particularly useful for archived or tumor tissues where fixed mitochondrial cutoffs are often too stringent or too lenient.
Citation: Hippen et al. (2021) miQC: An adaptive probabilistic framework for quality control of single-cell RNA-sequencing data. bioRxiv. doi: 10.1101/2021.03.03.433798Source: greenelab/miQC (Bioconductor)

Installation

# Install miQC from Bioconductor
BiocManager::install('miQC')

# Install flexmix (required dependency)
install.packages('flexmix')
At this point, the miQC algorithm has been adapted for use in Seurat through installation of flexmix only. The miQC Bioconductor package provides the reference implementation, but SeuratWrappers calls flexmix directly.

Key Functions

  • RunMiQC() — Fits a two-distribution mixture model and assigns each cell a probability of being compromised. Stores results in object metadata.
  • PlotMiQC() — Visualizes the fitted mixture model overlaid on a scatter plot of mitochondrial percentage vs. unique gene count.

How It Works

miQC assumes that a scRNA-seq dataset contains two populations of cells: intact cells (low mitochondrial reads, higher gene counts) and compromised cells (high mitochondrial reads, lower gene counts). It fits a two-component mixture model to the joint distribution of percent.mt and nFeature_RNA, then computes a posterior probability for each cell of belonging to the compromised component. Cells above a configurable posterior.cutoff are labeled for removal. This approach adapts to each dataset’s specific quality profile rather than requiring a universal threshold.

Workflow

1

Load libraries and data

library(Seurat)
library(SeuratData)
library(SeuratWrappers)
library(flexmix)

InstallData("pbmc3k")
data("pbmc3k")
2

Calculate mitochondrial percentage

miQC requires percent.mt and nFeature_RNA to be present in the object metadata. nFeature_RNA is computed automatically by CreateSeuratObject. Calculate percent.mt with PercentageFeatureSet.For human data, mitochondrial genes start with MT-. For mouse data, use mt-.
pbmc3k[["percent.mt"]] <- PercentageFeatureSet(object = pbmc3k, pattern = "^MT-")
Inspect the distribution before running the model:
FeatureScatter(pbmc3k, feature1 = "nFeature_RNA", feature2 = "percent.mt")
Look for a distinctive triangular shape: a wide range of mitochondrial percentages at lower gene counts tapering to low mitochondrial percentage at higher gene counts. If this pattern is absent, the two-distribution assumption may not hold for your data.
3

Run the miQC mixture model

pbmc3k <- RunMiQC(
  pbmc3k,
  percent.mt = "percent.mt",
  nFeature_RNA = "nFeature_RNA",
  posterior.cutoff = 0.75,
  model.slot = "flexmix_model"
)
After running, two new metadata columns are added:
  • miQC.probability — posterior probability of belonging to the compromised distribution
  • miQC.keep"keep" or "discard" decision based on posterior.cutoff
4

Visualize the model

Plot the mixture model with cells colored by their compromise probability:
PlotMiQC(pbmc3k, color.by = "miQC.probability") +
  ggplot2::scale_color_gradient(low = "grey", high = "purple")
Or visualize the keep/discard decisions directly:
PlotMiQC(pbmc3k, color.by = "miQC.keep")
5

Filter cells

Subset the Seurat object to retain only high-quality cells:
pbmc3k_filtered <- subset(pbmc3k, miQC.keep == "keep")
pbmc3k_filtered
# An object of class Seurat
# 13714 features across 2593 samples within 1 assay

RunMiQC Parameters

object
Seurat object
required
The Seurat object to run miQC on.
percent.mt
character
default:"percent.mt"
Name of the metadata column containing the percentage of reads attributed to mitochondrial genes.
nFeature_RNA
character
default:"nFeature_RNA"
Name of the metadata column containing the number of unique genes detected per cell.
posterior.cutoff
numeric
default:"0.75"
Posterior probability threshold for the compromised distribution. Cells with probability above this value are marked as "discard". Must be between 0 and 1. When processing multiple samples for the same experiment, use the same cutoff across all samples for consistency.
model.type
character
default:"linear"
Type of mixture model to fit. Options:
  • "linear" — linear mixture model (recommended)
  • "spline" — b-spline mixture model
  • "polynomial" — two-degree polynomial mixture model
model.slot
character
default:"flexmix_model"
Name of the misc slot in the Seurat object where the fitted flexmix model is stored.
backup.option
character
default:"percentile"
Fallback strategy when flexmix fails to fit a two-cluster model. Options:
  • "percentile" — filter by backup.percentile of the mitochondrial distribution
  • "percent" — filter by a fixed backup.percent mitochondrial cutoff
  • "pass" — return the object unchanged without miQC stats
  • "halt" — stop with an error
backup.percentile
numeric
default:"0.99"
Percentile cutoff for mitochondrial percentage when backup.option = "percentile".
backup.percent
numeric
default:"5"
Fixed mitochondrial percentage cutoff when backup.option = "percent".
verbose
logical
default:"TRUE"
Whether to print progress messages.

PlotMiQC Parameters

seurat_object
Seurat object
required
A Seurat object that has already been processed with RunMiQC.
percent.mt
character
default:"percent.mt"
Name of the metadata column with mitochondrial percentage.
nFeature_RNA
character
default:"nFeature_RNA"
Name of the metadata column with unique gene counts.
model.slot
character
default:"flexmix_model"
The misc slot where the flexmix model was stored during RunMiQC.
color.by
character
default:"miQC.probability"
Metadata column to use for coloring points. Common choices are "miQC.probability" (continuous gradient) and "miQC.keep" (categorical).

Non-linear Models

For datasets where a linear relationship between mitochondrial percentage and gene count does not hold, RunMiQC supports b-spline and polynomial models via the model.type parameter:
# B-spline model
pbmc3k <- RunMiQC(
  pbmc3k,
  percent.mt = "percent.mt",
  nFeature_RNA = "nFeature_RNA",
  posterior.cutoff = 0.75,
  model.slot = "flexmix_model",
  model.type = "spline"
)
PlotMiQC(pbmc3k, color.by = "miQC.keep")
All visualization and filtering functions work identically regardless of model type.

Handling Model Failures

Some datasets — particularly very clean ones — do not have a meaningful second population of compromised cells, so flexmix may fail to find two clusters. RunMiQC will issue a warning and fall back to the strategy set by backup.option:
pbmc3k_extreme <- RunMiQC(
  pbmc3k_extreme,
  percent.mt = "percent.mt",
  nFeature_RNA = "nFeature_RNA",
  posterior.cutoff = 0.9,
  model.slot = "flexmix_model",
  backup.option = "percentile",
  backup.percentile = 0.95
)
Before running miQC, use FeatureScatter to check whether the two-distribution assumption is appropriate for your data.

Inspecting Model Parameters

The raw flexmix model is stored in the object’s misc slot and can be accessed directly:
# View mixture model parameters
flexmix::parameters(Misc(pbmc3k, "flexmix_model"))

# View posterior probabilities for the first few cells
head(flexmix::posterior(Misc(pbmc3k, "flexmix_model")))

Build docs developers (and LLMs) love