Launching a HELICS co-simulation involves starting multiple independent processes—at least one broker and one or more federates—in the right order, with the right arguments, and often across multiple compute nodes. For small co-simulations this can be done manually from the command line; for larger ones running on high-performance computing (HPC) clusters, or for automated sweeps over many parameter combinations, a dedicated orchestration layer is needed. HELICS provides built-in tooling for local orchestration throughDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/GMLC-TDC/HELICS/llms.txt
Use this file to discover all available pages before exploring further.
helics-cli and integrates with the Merlin workflow system for HPC deployments.
Running multiple federates together with helics-cli
Thehelics run command (part of pyhelics) reads a JSON runner file that describes all the federates in a co-simulation and launches them together. This eliminates the need to open separate terminal windows or write custom shell scripts for every co-simulation.
Runner JSON format
A runner file describes the name of the federation and lists each federate as an object with the command to execute, the working directory, and the target host:Launching a co-simulation
The broker should always appear first in the
federates list, or at minimum be launched before the federates that connect to it. HELICS federates retry connections for the duration of the connection timeout (default 30 seconds), but starting the broker first avoids unnecessary retries.Orchestration with Merlin on HPC systems
Merlin is a distributed task queuing system designed for HPC workflows. It can interface with SLURM and Flux resource managers, handle resource allocation automatically, and run large numbers of co-simulations in parallel or in sequence. Within a Merlin workflow, individual HELICS co-simulations are launched viahelics run.
Why use Merlin with HELICS
- Automatic resource allocation on HPC clusters via SLURM or Flux—you specify the number of nodes needed, Merlin handles the scheduler.
- Complex workflows with analysis steps: run a co-simulation, analyze results, conditionally launch a follow-up co-simulation with updated inputs.
- Parallel execution of many co-simulations (for example, sensitivity analysis or Monte Carlo sweeps) without manually managing job submissions.
Merlin specification structure
A Merlin spec is a YAML file organized into sections. The key sections for a HELICS workflow are:description — name and summary of the study:
env — environment variables used throughout the spec. Here N_SAMPLES controls how many federate pairs to create:
merlin — the input-generation step. This calls a Python script to produce one runner JSON file per co-simulation instance, and writes the filenames to samples.csv:
study — the execution steps. Each step has a name and a run block. The FED variable resolves to each row from samples.csv so the launch command runs once per co-simulation instance:
cleanup step depends on all start_federates instances completing first.
Sequencing federate startup
When launching without an orchestration tool, the order of startup matters:Launch federates
Start each federate process. Federates retry connecting to the broker for up to the connection timeout (default 30 seconds):
Handling timeouts
HELICS has several timeout mechanisms to prevent co-simulations from hanging indefinitely.Connection timeout
Controls how long a federate waits to establish a connection to the broker. The default is 30 seconds:Heartbeat and federate timeout
A heartbeat timer (--tick) runs in the background of every broker and core. If no communication is received within one tick, a ping is sent. If there is no response within a further period, an error is raised and the co-simulation is terminated:
--slowresponding to prevent it from being treated as failed:
--debugging flag is shorthand for --slowresponding --disable_timer and is intended for interactive debugging sessions.
Grant timeout
At the federate level,--granttimeout triggers diagnostic output if a time grant takes longer than expected. This does not terminate the co-simulation but produces warnings:
- At 1× the timeout: a warning message is printed.
- At 3× the timeout: a resend of timing messages is requested.
- At 6× the timeout: full timing diagnostics are printed.
- At 10× the timeout: additional warnings are generated.
Maximum co-simulation duration
To cap the total wall-clock runtime of a co-simulation (useful in automated batch runs):Profiling and timing analysis
HELICS includes a profiling capability (available since version 2.8/3.0.1) that records timestamps when federates enter and exit HELICS blocking call loops. This identifies which federates are spending the most wall-clock time waiting on others.Enabling profiling
Profiling can be enabled at the broker, core, or federate level. Enabling it at a higher level automatically propagates to all children. Broker-level (propagates to all cores and federates):coreinitstring):
Reading profiling output
Each profiling message is wrapped in XML-like tags:HELICS CODE ENTRY: the federate is entering a HELICS blocking call (waiting for a time grant).HELICS CODE EXIT: the federate has received its time grant and is returning to user code.MARKER: a calibration timestamp that pairs the local system uptime with global wall-clock time, enabling correlation across multiple machines.
HELICS CODE ENTRY and HELICS CODE EXIT timestamps for the same federate shows how long that federate waited for a time grant at each step.
Timestamps are nanosecond-precision monotonic clock values (system uptime). Because different machines have different uptimes, the
MARKER messages provide a reference to calibrate across compute nodes. Network latency between nodes means cross-machine timestamp alignment is only accurate to microsecond or millisecond precision depending on network conditions.Program termination patterns
Clean shutdown
The normal termination path is for each federate to callhelicsFederateFinalize() after completing its last time step, then call helicsFederateFree() and helicsCloseLibrary(). When all federates have finalized, the broker shuts down automatically.
Ctrl-C handling
For C and C++ programs,Ctrl-C terminates the local process. For distributed co-simulations this leaves other processes running—they will either time out (if timeouts are enabled) or deadlock.
The C shared library provides signal handler utilities:
Generating a global error
Any component can trigger immediate federation-wide termination by generating a global error:--errortimeout option (default 10 seconds) controls how long the system waits after a global error before tearing down the co-simulation network, giving time for diagnostic queries.