Nested dissection ordering for sparse matrix fill reduction

When you solve a sparse linear system with a direct method — such as LU factorization or Cholesky — the factorization introduces new nonzero entries called fill-in. The amount of fill-in depends critically on the order in which you eliminate variables. A poor ordering can cause the factored matrix to be many times denser than the original, making the solve infeasible. Nested dissection ordering solves this problem by exploiting graph structure to minimize fill-in.

Why ordering matters

The sparsity pattern of a matrix corresponds to a graph: each variable is a vertex, and a nonzero off-diagonal entry A[i][j] is an edge between vertices i and j. During Gaussian elimination, eliminating vertex i may cause fill-in edges between all of i’s current neighbors. A fill-reducing ordering arranges variables so that the elimination order creates as few new edges as possible. For a finite element mesh, fill-in is directly related to the connectivity of the mesh elements. A well-ordered mesh produces an elimination order where each eliminated variable has few active neighbors, resulting in a sparse factorization.

Nested dissection orderings are especially effective for 2D and 3D meshes. For 2D meshes with n vertices, optimal nested dissection achieves O(n log n) fill-in; for 3D meshes, O(n^{4/3}).

How nested dissection works

Find a vertex separator

ParMETIS finds a small set of vertices (the separator) whose removal splits the graph into two roughly equal subgraphs. Finding this separator uses the same multilevel coarsen–partition–refine framework as the partitioning algorithms, with the PARMETIS_SRTYPE_2PHASE refinement strategy to produce compact separators.

Recurse on the subgraphs

ParMETIS recursively finds separators for each of the two subgraphs. This recursive bisection continues until each subgraph is small enough to order serially with METIS (METIS_NodeND).

Assign elimination order

Variables in the subgraphs receive lower elimination indices (they are eliminated first). Variables in the separator receive higher indices (they are eliminated last). Because the separator vertices are eliminated after their subgraph has been fully factored, they cannot introduce fill-in between the two subgraphs.This creates a hierarchical elimination tree structure where separator vertices at each level of the recursion act as independent blocks in the factored matrix.

Local serial ordering

Once each process owns its local subgraph (after the parallel bisection phases), ParMETIS calls METIS_NodeND on the local subgraph to compute a high-quality local ordering. The s_nseps parameter controls how many separator trials METIS performs at each serial bisection step.

Output: `order` and `sizes`

ParMETIS_V3_NodeND and ParMETIS_V32_NodeND produce two output arrays: order — a permutation array of length equal to the local vertex count on each process. order[i] gives the new (reordered) index of local vertex i. After all processes complete, the combined order arrays form a global permutation of all gnvtxs vertices. sizes — an array of length 2 * npes that describes the nested dissection tree. The first npes entries give the number of vertices owned by each process’s local subgraph (after redistribution). The remaining npes - 1 entries store the separator sizes at each level of the recursive bisection, with the top-level (largest) separator at index 2 * npes - 2.

idx_t  numflag = 0;
idx_t  options[3] = {0, 0, 0};
idx_t *order = malloc(nvtxs * sizeof(idx_t));  /* local vertex count */
idx_t *sizes = malloc(2 * npes * sizeof(idx_t));

ParMETIS_V3_NodeND(
    vtxdist, xadj, adjncy,
    &numflag, options,
    order, sizes, &comm
);

Using `order` to permute a sparse matrix

After computing the ordering, you reorder the matrix so that row and column i of the original matrix become row and column order[i] in the permuted matrix. This is a symmetric permutation P A P^T.

/*
 * Apply the nested dissection ordering to a CSR matrix.
 *
 * Inputs:
 *   n       - local number of rows
 *   rowptr  - CSR row pointers of original matrix (length n+1)
 *   colidx  - CSR column indices of original matrix
 *   values  - CSR values of original matrix
 *   order   - output of ParMETIS_V3_NodeND for local vertices
 *
 * The permuted matrix B = P * A * P^T satisfies:
 *   B[ order[i] ][ order[j] ] = A[i][j]
 */
void apply_nd_ordering(idx_t n, idx_t *rowptr, idx_t *colidx,
                       double *values, idx_t *order,
                       idx_t *new_rowptr, idx_t *new_colidx, double *new_values)
{
    /* Step 1: compute the inverse permutation */
    idx_t *inv_order = malloc(n * sizeof(idx_t));
    for (idx_t i = 0; i < n; i++)
        inv_order[order[i]] = i;

    /* Step 2: count entries per new row */
    for (idx_t i = 0; i < n; i++)
        new_rowptr[ order[i] + 1 ] = rowptr[i+1] - rowptr[i];

    /* convert counts to CSR offsets */
    new_rowptr[0] = 0;
    for (idx_t i = 0; i < n; i++)
        new_rowptr[i+1] += new_rowptr[i];

    /* Step 3: fill in new column indices and values */
    idx_t *pos = calloc(n, sizeof(idx_t));
    for (idx_t i = 0; i < n; i++) {
        idx_t new_i = order[i];
        for (idx_t k = rowptr[i]; k < rowptr[i+1]; k++) {
            idx_t dest = new_rowptr[new_i] + pos[new_i]++;
            new_colidx[dest] = order[colidx[k]];
            new_values[dest] = values[k];
        }
    }

    free(inv_order);
    free(pos);
}

Most sparse direct solver libraries — SuperLU, MUMPS, PETSc, Trilinos — accept a permutation vector directly. You can pass the order array (gathered from all processes) as the reordering input rather than permuting the matrix manually.

API options

ParMETIS provides three ordering functions with different levels of control:

ParMETIS_V3_NodeND
ParMETIS_V32_NodeND
ParMETIS_SerialNodeND

The standard interface. Uses default parameters for matching type, refinement type, and separator counts.

int ParMETIS_V3_NodeND(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy,
    idx_t *numflag, idx_t *options,
    idx_t *order, idx_t *sizes,
    MPI_Comm *comm);

Internally delegates to ParMETIS_V32_NodeND with:

mtype = PARMETIS_MTYPE_GLOBAL
rtype = PARMETIS_SRTYPE_2PHASE
p_nseps = 1, s_nseps = 1
ubfrac = ORDER_UNBALANCE_FRACTION (1.10)

This is the right choice for most applications.

The extended interface with full parameter control.

int ParMETIS_V32_NodeND(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy, idx_t *vwgt,
    idx_t *numflag, idx_t *mtype, idx_t *rtype,
    idx_t *p_nseps, idx_t *s_nseps,
    real_t *ubfrac, idx_t *seed, idx_t *dbglvl,
    idx_t *order, idx_t *sizes,
    MPI_Comm *comm);

Additional parameters:

Parameter	Type	Description
`vwgt`	`idx_t *`	Vertex weights for weighted separator balance. Pass `NULL` for unweighted.
`mtype`	`idx_t *`	Matching type: `PARMETIS_MTYPE_LOCAL` or `PARMETIS_MTYPE_GLOBAL`.
`rtype`	`idx_t *`	Separator refinement type: `PARMETIS_SRTYPE_GREEDY` or `PARMETIS_SRTYPE_2PHASE`.
`p_nseps`	`idx_t *`	Number of parallel separator trials at each bisection level. Higher values improve quality at the cost of more communication.
`s_nseps`	`idx_t *`	Number of serial separator trials passed to `METIS_NodeND` for the local subgraph ordering.
`ubfrac`	`real_t *`	Maximum allowed imbalance fraction for separator bisections. Default is 1.10 (10% imbalance).
`seed`	`idx_t *`	Random number seed for reproducible results.
`dbglvl`	`idx_t *`	Debug level (same flags as partitioning functions).

idx_t  mtype   = PARMETIS_MTYPE_GLOBAL;
idx_t  rtype   = PARMETIS_SRTYPE_2PHASE;
idx_t  p_nseps = 3;   /* try 3 parallel separators, keep best */
idx_t  s_nseps = 2;   /* try 2 serial separators per subgraph */
real_t ubfrac  = 1.10;
idx_t  seed    = 42;
idx_t  dbglvl  = 0;

ParMETIS_V32_NodeND(
    vtxdist, xadj, adjncy, NULL,
    &numflag, &mtype, &rtype,
    &p_nseps, &s_nseps, &ubfrac,
    &seed, &dbglvl,
    order, sizes, &comm
);

Gathers the distributed graph to a single process and computes the ordering using the serial METIS library. Suitable for small graphs or when the distributed ordering quality is insufficient.

int ParMETIS_SerialNodeND(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy,
    idx_t *numflag, idx_t *options,
    idx_t *order, idx_t *sizes,
    MPI_Comm *comm);

ParMETIS_SerialNodeND requires that the entire graph fit in the memory of a single process. For large distributed graphs, use ParMETIS_V3_NodeND instead.

PARMETIS_SRTYPE_2PHASE is the default refinement strategy for ordering and generally produces better results than PARMETIS_SRTYPE_GREEDY alone. PARMETIS_SRTYPE_GREEDY visits boundary vertices from highest to lowest gain (the gain of moving a vertex is the change in separator size). It makes each move immediately if it improves the separator, but it cannot undo moves even if a different sequence would yield a better overall result. PARMETIS_SRTYPE_2PHASE first constructs a separator using a boundary-expansion phase, then applies the greedy refinement pass. The first phase explores a larger neighborhood before committing to moves, which allows it to find improvements that greedy alone would miss.

/* Default: use 2-phase refinement */
idx_t rtype = PARMETIS_SRTYPE_2PHASE;

/* Faster but lower quality: greedy only */
idx_t rtype = PARMETIS_SRTYPE_GREEDY;

Use cases for nested dissection ordering

Nested dissection ordering is used whenever you need to factorize a large sparse matrix directly:

Finite element simulations: The stiffness matrix of a finite element mesh has the same sparsity pattern as the mesh graph. A nested dissection ordering of the mesh minimizes fill-in during the Cholesky or LU factorization of the stiffness matrix.
Preconditioning: Incomplete LU (ILU) preconditioners benefit from a fill-reducing ordering because it concentrates the ignored fill-in in less important positions.
Graph-structured problems: Any symmetric positive definite system arising from graph Laplacians, Poisson equations, or similar stencil problems benefits from nested dissection.

Libraries that accept a permutation array for reordering include:

SuperLU (superlu_options.ColPerm = MY_PERMC)
MUMPS (ICNTL(7) = 1 with user-supplied ordering)
PETSc (MatGetOrdering with external ordering)

Understanding the sizes array

The sizes array has length 2 * npes. After ParMETIS_V3_NodeND returns:

sizes[0] through sizes[npes - 1] contain the number of locally owned vertices on each process after the final graph redistribution step.
sizes[npes] through sizes[2 * npes - 2] contain the separator sizes at each level of the nested dissection tree (from the finest level to the coarsest).
The top-level separator (the one that divides the entire graph) has size sizes[2 * npes - 2].

These values are useful for partitioning the work of a sparse direct solve: each subdomain at the leaf level can be factored independently on a separate process, and the separator blocks must be factored last using a reduced system (Schur complement).

Get Started

Concepts

Guides

Configuration

Nested dissection ordering for sparse matrix fill reduction

Why ordering matters

How nested dissection works

Output: `order` and `sizes`

Using `order` to permute a sparse matrix

API options

Two-phase separator refinement

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Configuration

Documentation Index

​Why ordering matters

​How nested dissection works

​Output: order and sizes

​Using order to permute a sparse matrix

​API options

​Two-phase separator refinement

Build docs developers (and LLMs) love

Why ordering matters

How nested dissection works

Output: `order` and `sizes`

Using `order` to permute a sparse matrix

API options

Two-phase separator refinement