Compute fill-reducing orderings for sparse solvers

Sparse direct solvers (LU, Cholesky, LDLT) factorize a matrix by eliminating variables one at a time. The order in which variables are eliminated determines how much fill — new nonzeros introduced during factorization — is created. A good fill-reducing ordering can reduce both memory usage and factorization time by orders of magnitude. ParMETIS_V3_NodeND computes a parallel nested dissection ordering on a distributed graph, producing the permutation vector and separator tree sizes needed by downstream sparse solvers.

When to use ordering

Apply NodeND before calling a sparse direct solver such as SuperLU, PARDISO, MUMPS, or PETSc’s direct solve. The typical workflow is:

Assemble the sparse matrix and represent its sparsity pattern as a graph.
Compute a fill-reducing ordering with NodeND.
Pass the ordering to the solver’s symbolic factorization step.
Factor and solve.

Do this once per matrix structure. If only the values change (not the sparsity pattern) you can reuse the ordering across multiple solves.

Output arrays

NodeND produces two output arrays:

Array	Length	Meaning
`order`	`local_nvtxs`	`order[i]` is the new (permuted) index of local vertex `i`.
`sizes`	`2 * npes`	Sizes of the separators in the nested dissection tree. Used by sparse solvers to set up the elimination tree.

Step-by-step workflow

Build the distributed graph

Populate vtxdist, xadj, and adjncy exactly as you would for PartKway. The graph should represent the sparsity pattern of the symmetric matrix: vertices are unknowns, edges are off-diagonal nonzeros.

Allocate output arrays

idx_t local_nvtxs = vtxdist[mype+1] - vtxdist[mype];
idx_t *order = (idx_t *)malloc(local_nvtxs * sizeof(idx_t));
idx_t *sizes = (idx_t *)malloc(2 * npes    * sizeof(idx_t));

Set options

Set options[0] = 0 to use defaults. Set options[0] = 1 to enable custom debug level and seed via options[PMV3_OPTION_DBGLVL] and options[PMV3_OPTION_SEED].

idx_t options[3] = {0, 0, 0};

Call NodeND

idx_t numflag = 0;
ParMETIS_V3_NodeND(vtxdist, xadj, adjncy, &numflag,
                   options, order, sizes, &comm);

Pass ordering to the sparse solver

Gather order on all processes (or use the distributed ordering directly, depending on the solver’s API). Use sizes to reconstruct the separator tree if the solver requires it.

Complete example

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <parmetis.h>

int main(int argc, char *argv[])
{
    MPI_Comm comm;
    int npes, mype;

    MPI_Init(&argc, &argv);
    MPI_Comm_dup(MPI_COMM_WORLD, &comm);
    MPI_Comm_size(comm, &npes);
    MPI_Comm_rank(comm, &mype);

    /* 8-vertex ring graph distributed across 2 processes */
    idx_t vtxdist[3] = {0, 4, 8};
    idx_t nvtxs_local = vtxdist[mype+1] - vtxdist[mype];

    idx_t xadj_p0[]   = {0, 2, 4, 6, 8};
    idx_t adjncy_p0[] = {7, 1,  0, 2,  1, 3,  2, 4};
    idx_t xadj_p1[]   = {0, 2, 4, 6, 8};
    idx_t adjncy_p1[] = {3, 5,  4, 6,  5, 7,  6, 0};

    idx_t *xadj   = (mype == 0) ? xadj_p0   : xadj_p1;
    idx_t *adjncy = (mype == 0) ? adjncy_p0 : adjncy_p1;

    idx_t  numflag    = 0;
    idx_t  options[3] = {0, 0, 0};

    idx_t *order = (idx_t *)malloc(nvtxs_local * sizeof(idx_t));
    idx_t *sizes = (idx_t *)malloc(2 * npes    * sizeof(idx_t));

    ParMETIS_V3_NodeND(vtxdist, xadj, adjncy, &numflag,
                       options, order, sizes, &comm);

    for (int p = 0; p < npes; p++) {
        if (mype == p) {
            for (idx_t i = 0; i < nvtxs_local; i++)
                printf("process %d: vertex %d -> order %d\n",
                       mype, (int)(vtxdist[mype] + i), (int)order[i]);
        }
        MPI_Barrier(comm);
    }

    if (mype == 0) {
        printf("Separator sizes:");
        for (int i = 0; i < 2 * npes; i++)
            printf(" %d", (int)sizes[i]);
        printf("\n");
    }

    free(order); free(sizes);
    MPI_Comm_free(&comm);
    MPI_Finalize();
    return 0;
}

Extended control: ParMETIS_V32_NodeND

ParMETIS_V32_NodeND exposes additional parameters for fine-grained control of the ordering algorithm.

Parameter	Type	Description
`vwgt`	`idx_t *`	Vertex weights. Pass `NULL` for uniform weights.
`mtype`	`idx_t *`	Matching type: `PARMETIS_MTYPE_LOCAL` (1) or `PARMETIS_MTYPE_GLOBAL` (2).
`rtype`	`idx_t *`	Separator refinement type: `PARMETIS_SRTYPE_GREEDY` (1) or `PARMETIS_SRTYPE_2PHASE` (2).
`p_nseps`	`idx_t *`	Number of separators to compute in parallel.
`s_nseps`	`idx_t *`	Number of separators to compute in serial.
`ubfrac`	`real_t *`	Imbalance tolerance for the nested dissection bisections (e.g. `1.05`).
`seed`	`idx_t *`	Random seed for reproducibility.
`dbglvl`	`idx_t *`	Debug level (OR of `PARMETIS_DBGLVL_*` constants).

idx_t  mtype    = PARMETIS_MTYPE_GLOBAL;
idx_t  rtype    = PARMETIS_SRTYPE_2PHASE;
idx_t  p_nseps  = 1, s_nseps = 1;
real_t ubfrac   = 1.05f;
idx_t  seed     = 15, dbglvl = 0;

ParMETIS_V32_NodeND(vtxdist, xadj, adjncy,
                    NULL,       /* vwgt */
                    &numflag, &mtype, &rtype,
                    &p_nseps, &s_nseps, &ubfrac,
                    &seed, &dbglvl,
                    order, sizes, &comm);

Serial ordering: ParMETIS_SerialNodeND

ParMETIS_SerialNodeND gathers the full graph on a single process and runs the serial METIS nested dissection algorithm. Use it for small graphs or when you need the higher quality of the serial ordering algorithm.

ParMETIS_SerialNodeND(vtxdist, xadj, adjncy, &numflag,
                      options, order, sizes, &comm);

ParMETIS_SerialNodeND replicates the entire graph on one process. For large graphs this can exhaust memory. Use ParMETIS_V3_NodeND for distributed-memory scalability.

Function signatures

int ParMETIS_V3_NodeND(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy,
    idx_t *numflag, idx_t *options,
    idx_t *order, idx_t *sizes, MPI_Comm *comm);

int ParMETIS_V32_NodeND(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy, idx_t *vwgt,
    idx_t *numflag, idx_t *mtype, idx_t *rtype,
    idx_t *p_nseps, idx_t *s_nseps, real_t *ubfrac,
    idx_t *seed, idx_t *dbglvl,
    idx_t *order, idx_t *sizes, MPI_Comm *comm);

int ParMETIS_SerialNodeND(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy,
    idx_t *numflag, idx_t *options,
    idx_t *order, idx_t *sizes, MPI_Comm *comm);

Compilation

mpicc -o ordering ordering.c -lparmetis -lmetis

Get Started

Concepts

Guides

Configuration

Compute fill-reducing orderings for sparse solvers

When to use ordering

Output arrays

Step-by-step workflow

Complete example

Extended control: ParMETIS_V32_NodeND

Serial ordering: ParMETIS_SerialNodeND

Function signatures

Compilation

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Configuration

Documentation Index

​When to use ordering

​Output arrays

​Step-by-step workflow

​Complete example

​Extended control: ParMETIS_V32_NodeND

​Serial ordering: ParMETIS_SerialNodeND

​Function signatures

​Compilation

Build docs developers (and LLMs) love

When to use ordering

Output arrays

Step-by-step workflow

Complete example

Extended control: ParMETIS_V32_NodeND

Serial ordering: ParMETIS_SerialNodeND

Function signatures

Compilation