Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/KarypisLab/ParMETIS/llms.txt

Use this file to discover all available pages before exploring further.

ParMETIS provides several partitioning algorithms for different scenarios. All of them build on the same multilevel framework — a three-phase approach that produces high-quality partitions efficiently — but they differ in how they initialize the partition and how much graph structure or coordinate data they use.

The multilevel framework

Every graph partitioning and ordering algorithm in ParMETIS follows the same three-phase multilevel structure. This approach scales to large distributed graphs because the expensive initial partitioning happens only on a much smaller coarsened graph.
1

Coarsening

ParMETIS repeatedly collapses the graph by finding a matching — a set of edges or vertex pairs — and merging each matched pair into a single coarser vertex. The weight of the new vertex is the sum of its constituents’ weights, and the edges of the new vertex are the union of their neighbors (excluding internal edges).This process repeats until the graph has fewer than CoarsenTo vertices (computed internally as a function of nparts and ncon). The sequence of graphs from finest to coarsest forms a multilevel hierarchy.Two matching strategies control the coarsening quality:
ConstantValueBehavior
PARMETIS_MTYPE_LOCAL1Each vertex can only match with vertices on the same process. Faster but may produce lower-quality coarsenings at process boundaries.
PARMETIS_MTYPE_GLOBAL2Vertices can match with vertices on neighboring processes. Produces better coarsening quality, especially for highly connected distributed graphs.
The coarsening stops when the graph size reduction between successive levels falls below COARSEN_FRACTION (0.75), indicating that further coarsening is no longer effective.
2

Initial partitioning

Once the coarse graph is small enough, ParMETIS computes an initial partition. For k-way partitioning, it runs NIPARTS (8) random initial partition trials using a greedy algorithm and retains the best result.If the coarse graph fits on a single process, ParMETIS calls the serial METIS library (METIS_PartGraphKway) directly. This produces a high-quality initial partition at low cost because the graph is small.
3

Uncoarsening and refinement

ParMETIS projects the partition back up through the coarsening hierarchy. At each level, it runs k-way FM (Fiduccia-Mattheyses) refinement (KWayFM) to improve the partition. The refinement moves boundary vertices between partitions to reduce the edge cut while keeping partition weights within the balance tolerance.For multi-constraint problems (ncon > 1), KWayBalance runs a balancing pass before the FM pass whenever the imbalance exceeds ubavg + 0.035.
The coarsening target is computed as min(gnvtxs + 1, 25 * ncon * max(npes, nparts)). For typical problems this means the coarse graph has on the order of tens of thousands of vertices regardless of the original size.

k-way partitioning: ParMETIS_V3_PartKway

ParMETIS_V3_PartKway is the primary partitioning function. It computes a k-way partition directly — without recursively bisecting — using the full multilevel framework described above.
int ParMETIS_V3_PartKway(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy, idx_t *vwgt,
    idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts,
    real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut, idx_t *part,
    MPI_Comm *comm);
Use ParMETIS_V3_PartKway when:
  • You do not have vertex coordinate data.
  • You need a fresh partition from scratch (not a refinement of an existing one).
  • You have nparts that may differ from npes.
If the total graph has fewer than SMALLGRAPH (10,000) vertices or fewer than 20 vertices per process, ParMETIS partitions it serially using METIS instead of running the full parallel algorithm.

Geometry-assisted partitioning

When vertex coordinates are available, ParMETIS can use spatial information to seed a better initial partition or to compute a fast geometry-only partition.
ParMETIS_V3_PartGeom partitions the graph using space-filling curves (SFC) on coordinate data alone, without looking at the graph edges at all.
int ParMETIS_V3_PartGeom(
    idx_t *vtxdist, idx_t *ndims, real_t *xyz,
    idx_t *part, MPI_Comm *comm);
The algorithm:
  1. Maps each vertex’s coordinates into a Z-order (Morton) curve value using ndims-dimensional bit interleaving with 2^9 = 512 bins per dimension.
  2. Sorts vertices globally along the SFC using parallel sample sort (PseudoSampleSort).
  3. Assigns consecutive ranges of the sorted order to each of the npes partitions.
This produces a partition in O(n log n) time with minimal communication. The partition quality is lower than graph-based methods because edge connectivity is ignored, but the speed makes it useful for quick load balancing or as a building block.
ParMETIS_V3_PartGeom always produces exactly npes partitions (one per process). Use ParMETIS_V3_PartGeomKway if you need nparts != npes.

k-way refinement: ParMETIS_V3_RefineKway

ParMETIS_V3_RefineKway refines an existing partition without repartitioning from scratch. You provide a part array that already assigns each local vertex to a partition, and ParMETIS improves it using the multilevel k-way refinement loop.
int ParMETIS_V3_RefineKway(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy, idx_t *vwgt,
    idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon, idx_t *nparts,
    real_t *tpwgts, real_t *ubvec, idx_t *options, idx_t *edgecut,
    idx_t *part, MPI_Comm *comm);
Use ParMETIS_V3_RefineKway when:
  • You already have a reasonable partition and want to improve it without the overhead of full repartitioning.
  • You are iterating: you computed a partition, modified some vertex weights slightly, and want to recover balance with minimal movement.
The options array element options[PMV3_OPTION_PSR] controls the processor-subdomain relationship:
ConstantValueMeaning
PARMETIS_PSR_COUPLED1The number of partitions equals the number of processes (nparts == npes). Each process manages exactly one partition during refinement.
PARMETIS_PSR_UNCOUPLED2The number of partitions differs from the number of processes. ParMETIS maps partitions to processes as needed.
Use PARMETIS_PSR_COUPLED when running a simulation where each MPI rank corresponds to one partition. Use PARMETIS_PSR_UNCOUPLED when you compute more partitions than processes, such as when preparing data for a run with more ranks.

Adaptive repartitioning: ParMETIS_V3_AdaptiveRepart

ParMETIS_V3_AdaptiveRepart is designed for evolving meshes — simulations where vertex weights change over time as load shifts (for example, due to adaptive mesh refinement). It balances the quality of the new partition against the cost of redistributing data.
int ParMETIS_V3_AdaptiveRepart(
    idx_t *vtxdist, idx_t *xadj, idx_t *adjncy, idx_t *vwgt,
    idx_t *vsize, idx_t *adjwgt, idx_t *wgtflag, idx_t *numflag, idx_t *ncon,
    idx_t *nparts, real_t *tpwgts, real_t *ubvec, real_t *ipc2redist,
    idx_t *options, idx_t *edgecut, idx_t *part, MPI_Comm *comm);
The key parameter is ipc2redist (inter-partition communication to redistribution cost ratio). It controls the trade-off:
  • Low ipc2redist (e.g., 0.001) — redistribution is cheap; aggressively minimize edge cut regardless of how many vertices move.
  • High ipc2redist (e.g., 1000.0) — redistribution is expensive; prefer keeping vertices in place even if it results in a higher edge cut.
/* Treat redistribution as moderately expensive */
real_t ipc2redist = 100.0;

/* vsize describes the data volume to move per vertex */
idx_t vsize[nvtxs]; /* populate with per-vertex data sizes */

ParMETIS_V3_AdaptiveRepart(
    vtxdist, xadj, adjncy,
    vwgt, vsize, adjwgt,
    &wgtflag, &numflag, &ncon, &nparts,
    tpwgts, ubvec, &ipc2redist,
    options, &edgecut, part, &comm
);
The ordering algorithms (ParMETIS_V3_NodeND, ParMETIS_V32_NodeND) use separator-based refinement instead of FM refinement. Two strategies are available via the rtype parameter:
ConstantValueBehavior
PARMETIS_SRTYPE_GREEDY1Visits boundary vertices from highest to lowest gain and moves each one that improves the separator. Fast but may miss improvements that require simultaneous moves.
PARMETIS_SRTYPE_2PHASE2First builds the separator, then refines it using PARMETIS_SRTYPE_GREEDY in a second phase. Generally produces smaller, better-balanced separators at the cost of additional computation.
PARMETIS_SRTYPE_2PHASE is the default when using ParMETIS_V3_NodeND (which calls ParMETIS_V32_NodeND with rtype = NULL).
The PARMETIS_PSR_COUPLED and PARMETIS_PSR_UNCOUPLED constants are relevant for ParMETIS_V3_RefineKway and ParMETIS_V3_AdaptiveRepart. They control how ParMETIS internally maps the nparts subdomains onto the npes processes during refinement.In the coupled case, each process is responsible for exactly one partition, which simplifies communication patterns and can improve performance. In the uncoupled case, the mapping is computed dynamically, allowing nparts to be any value.Set options[PMV3_OPTION_PSR] to one of these constants. PMV3_OPTION_PSR equals 3, the same index as PMV3_OPTION_IPART.
Set options[PMV3_OPTION_DBGLVL] to a bitwise OR of these flags to enable diagnostic output:
FlagValueOutput
PARMETIS_DBGLVL_TIME1Per-phase timing
PARMETIS_DBGLVL_INFO2General information
PARMETIS_DBGLVL_PROGRESS4Coarsening progress per level
PARMETIS_DBGLVL_REFINEINFO8Refinement statistics
PARMETIS_DBGLVL_MATCHINFO16Matching information
PARMETIS_DBGLVL_RMOVEINFO32Vertex movement information
PARMETIS_DBGLVL_REMAP64Remapping decisions
options[0] = 1;  /* enable options array */
options[PMV3_OPTION_DBGLVL] = PARMETIS_DBGLVL_TIME | PARMETIS_DBGLVL_PROGRESS;

Build docs developers (and LLMs) love