Parallel computing architecture in GridPACK

GridPACK targets high-performance computing platforms and is built on two parallel runtimes: MPI for inter-process messaging and the Global Arrays (GA) toolkit for one-sided remote memory access. This combination lets the framework balance ease of use — familiar MPI communicator semantics — with high-throughput data operations for tasks like ghost cell synchronization and distributed key-value stores. Application developers rarely interact with GA directly; instead they work through framework abstractions such as Communicator, TaskManager, GlobalStore, and GlobalVector.

MPI + Global Arrays

MPI provides point-to-point and collective communication used by the solver libraries (PETSc), the partitioner, and explicit ghost exchange operations. Global Arrays extends MPI with shared-memory-style access to distributed arrays, enabling efficient one-sided reads and atomic operations without explicit message passing. Every GridPACK component that performs parallel operations takes a Communicator as input. The Communicator maintains both an MPI communicator handle (via boost::mpi) and an internal GA process group handle, so the same object drives both runtimes.

The `Communicator` class

Communicator wraps a boost::mpi::communicator and a GA process group. It lives in the gridpack::parallel namespace.

#include "gridpack/parallel/communicator.hpp"

// Default constructor — wraps MPI_COMM_WORLD
gridpack::parallel::Communicator world;

// Query rank and size
int me   = world.rank();
int nprocs = world.size();

// Barrier across all processes
world.barrier();

// Split into sub-communicators by color
gridpack::parallel::Communicator sub = world.split(color);

// A single-process communicator for this rank (useful for serial solvers)
gridpack::parallel::Communicator self_comm = world.self();

// Divide into sub-communicators of at most nsize processes each
gridpack::parallel::Communicator group = world.divide(nsize);

// Collective reductions
double local = /* ... */;
world.sum(&local, 1); // in-place all-reduce sum

// Implicit cast to raw MPI_Comm for interoperability with C libraries
MPI_Comm raw = static_cast<MPI_Comm>(world);

Communicator objects are passed by value throughout GridPACK; copies share the same underlying communicator via reference counting.

Every gridpack object that uses Global Arrays — including networks, task managers, GlobalStore, and GlobalVector — must be constructed with a Communicator. The communicator ties the GA process group to the object’s lifetime.

Ghost cell exchange pattern

The most frequent parallel operation in a GridPACK solve loop is updating ghost buses and branches so that locally-owned components can read up-to-date neighbor data. This is a push-then-read pattern:

Allocate exchange buffers

After network->partition(), call factory.setExchange(). This queries getXCBufSize() on each component and allocates a buffer of that size. The buffer address is then pushed into the component via setXCBuf(void *buf).

Initialize update data structures

Call network->initBusUpdate() and/or network->initBranchUpdate() once. This builds the internal communication maps used for subsequent updates. Omit branch updates if no branch data needs synchronization.

Write state into exchange buffer

At the start of each iteration, each active component writes its current state values into the exchange buffer using the internal pointers set during setXCBuf.

Execute the update

network->updateBuses() triggers a collective operation that copies each active bus’s buffer content to the ghost copies of that bus on neighboring ranks. After this call, ghost buses reflect current state from their home processor.

// Exchange buffer implementation in a bus class
int MyBus::getXCBufSize(void) {
    return 2 * sizeof(double); // angle + magnitude
}

void MyBus::setXCBuf(void *buf) {
    p_ang_ptr = static_cast<double*>(buf);
    p_mag_ptr = p_ang_ptr + 1;
}

// Before network->updateBuses(), copy state into buffer:
void MyBus::prepareExchange() {
    *p_ang_ptr = p_voltage_ang;
    *p_mag_ptr = p_voltage_mag;
}

// After network->updateBuses(), ghost buses can read:
void MyBus::readFromExchange() {
    p_voltage_ang = *p_ang_ptr;
    p_voltage_mag = *p_mag_ptr;
}

The Shuffler for data redistribution

When GridPACK needs to move an arbitrary set of objects across processors — not just ghost updates, but wholesale redistribution — it uses the Shuffler utility (in gridpack/parallel/shuffler.hpp). The shuffler accepts a list of (object, destination-rank) pairs and performs an all-to-all redistribution in a single pass. This is used internally by the partitioner to move BusData and BranchData structs after the graph is partitioned.

The `TaskManager` pattern

Many power grid analyses involve processing a large collection of independent cases — contingency analysis is the canonical example, where each contingency is an independent network solve. The TaskManager class distributes these tasks across processors using a Global Arrays atomic counter, ensuring that each task is processed by exactly one processor with minimal synchronization overhead.

#include "gridpack/parallel/task_manager.hpp"

gridpack::parallel::Communicator comm;
gridpack::parallel::TaskManager task_mgr(comm);

int num_contingencies = 500;

// Rank 0 (or any rank) sets the task count
task_mgr.set(num_contingencies);

int task_id;
while (task_mgr.nextTask(&task_id)) {
    // Each rank independently picks up the next available task
    run_contingency(task_id, network);
}
// nextTask returns false and calls GA_Pgroup_sync when all tasks are done

For analyses that assign a full sub-communicator to each task (multi-level parallelism), the variant nextTask(Communicator &comm, int *next) ensures all processors in the sub-communicator receive the same task index atomically:

gridpack::parallel::Communicator sub_comm = world.divide(ranks_per_task);
gridpack::parallel::TaskManager task_mgr(world);
task_mgr.set(num_contingencies);

int task_id;
while (task_mgr.nextTask(sub_comm, &task_id)) {
    // All processes in sub_comm work on task_id together
    run_contingency_parallel(task_id, sub_network, sub_comm);
}

TaskManager uses NGA_Read_inc — an atomic fetch-and-increment on a one-element GA array — so task distribution scales to large processor counts without a central scheduler bottleneck.

Use task_mgr.printStats() after the task loop to print a per-processor task count. Significant imbalance suggests that individual task runtimes vary widely and may benefit from task reordering or dynamic load balancing.

`GlobalStore` — distributed indexed vectors

GlobalStore<T> lets each processor contribute variable-length vectors indexed by an integer key. After all contributions are added, a single upload() call makes every vector accessible to any processor via one-sided GA reads.

#include "gridpack/parallel/global_store.hpp"

gridpack::parallel::GlobalStore<double> store(comm);

// Each processor adds vectors for the contingencies it processed
for (int i : my_contingency_indices) {
    std::vector<double> result = get_bus_voltages(i);
    store.addVector(i, result);  // i is the global contingency index
}

// Make all data globally visible (collective call)
store.upload();

// Any processor can now retrieve any contingency result
std::vector<double> voltages;
store.getVector(42, voltages);  // fetch result for contingency 42

After calling upload(), no further addVector() calls are permitted. Additionally, if getVector() is interleaved with MPI collective calls, insert a comm.sync() between the last getVector() and the next MPI operation to avoid hangs caused by GA/MPI interaction.

`GlobalVector` — distributed linear arrays

GlobalVector<T> targets the case where each processor contributes a contiguous block to a global linear array, and all processors need to read back arbitrary ranges after upload:

#include "gridpack/parallel/global_vector.hpp"

gridpack::parallel::GlobalVector<double> gvec(comm);

// Each processor assigns indices and values for its slice
for (int i = local_start; i < local_end; i++) {
    gvec.addElement(i, my_values[i - local_start]);
}

// Upload to global storage (collective)
gvec.upload();

// Read a contiguous range from global storage
std::vector<double> all_values;
gvec.getVector(0, total_size, all_values);

GlobalVector is used, for example, to collect per-bus results from all processors into a single vector that can be exported to a file or fed into a subsequent analysis stage.

PROGRESS_RANKS: asynchronous communication

Global Arrays supports multiple communication runtimes. The default uses MPI two-sided messaging, which is straightforward but does not scale well beyond approximately a dozen processors. For large core counts, the progress ranks runtime delivers significantly higher throughput by dedicating one MPI process per SMP node to managing communication asynchronously. When building GridPACK with the progress ranks runtime, set the CMake flag:

-D USE_PROGRESS_RANKS:BOOL=TRUE

and build Global Arrays with the --with-mpi-pr flag instead of --with-mpi-ts.

Progress ranks reserve one MPI process per SMP node for communication management. If you request 20 processes across 4 nodes (5 per node), only 16 processes are available to your application (4 per node). Account for this when specifying job allocations and setting MPIEXEC_NUM_PROCS in CMake.

When USE_PROGRESS_RANKS=TRUE, GridPACK adjusts internal communicator construction so that the GA process group excludes the reserved communication ranks, keeping application logic transparent to the difference.

Framework overview

The four-layer architecture and Communicator usage patterns

Network model

How ghost buses and branches are created during partitioning

Bus and branch components

Exchange buffer implementation in component classes

Get Started

Core Concepts

Framework Reference

Applications

Developer Guide

Parallel computing architecture in GridPACK

MPI + Global Arrays

The `Communicator` class

Ghost cell exchange pattern

The Shuffler for data redistribution

The `TaskManager` pattern

`GlobalStore` — distributed indexed vectors

`GlobalVector` — distributed linear arrays

PROGRESS_RANKS: asynchronous communication

Framework overview

Network model

Bus and branch components

Build docs developers (and LLMs) love

Get Started

Core Concepts

Framework Reference

Applications

Developer Guide

Documentation Index

​MPI + Global Arrays

​The Communicator class

​Ghost cell exchange pattern

​The Shuffler for data redistribution

​The TaskManager pattern

​GlobalStore — distributed indexed vectors

​GlobalVector — distributed linear arrays

​PROGRESS_RANKS: asynchronous communication

Framework overview

Network model

Bus and branch components

Build docs developers (and LLMs) love

MPI + Global Arrays

The `Communicator` class

Ghost cell exchange pattern

The Shuffler for data redistribution

The `TaskManager` pattern

`GlobalStore` — distributed indexed vectors

`GlobalVector` — distributed linear arrays

PROGRESS_RANKS: asynchronous communication