Universe runs two background services that keep the cluster in a healthy state without manual intervention. TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/ohemilyy/universe/llms.txt
Use this file to discover all available pages before exploring further.
InstanceHealthMonitor checks every five seconds whether each instance on the local node is still running and marks it OFFLINE if the process has exited. The InstanceCountEnforcer — active only on the Master node — checks every five seconds whether each configuration’s active instance count is below its configured minimum and automatically spawns replacements. Together these two loops mean the cluster self-heals from process crashes and node failures with no operator action required.
Instance states
Every instance tracked in the cluster state map carries anInstanceState value that reflects where it is in its lifecycle.
| State | Meaning |
|---|---|
CREATING | The deploy task has been dispatched to a Wrapper node; the instance process has not started yet |
ONLINE | The instance is running and (optionally) sending regular heartbeats via PUT /api/instances/{id}/state |
OFFLINE | The health monitor detected the process is no longer running, or the Wrapper node disconnected; the instance record is retained in the state map |
STOPPED | The instance was stopped intentionally via instance stop or the stop REST endpoint |
OFFLINE instances are kept in the cluster state map deliberately. A Wrapper node disconnecting from Hazelcast does not remove instance records — it only stops heartbeats. The records remain until you explicitly stop or recreate them.Instance health monitor
InstanceHealthMonitor runs on every node (Master and Wrapper alike). It uses a single-threaded scheduled executor with a five-second fixed-rate interval.
What the check loop does:
- Reads the local Hazelcast member UUID.
- Filters all cluster instances to those whose
wrapperNodeIdmatches the local UUID and whosestateisONLINE. - For each matching instance, looks up the
RuntimeProviderregistered under the configuration’sruntimekey. - Calls
runtimeProvider.isRunning(instance.id)— this checks whether the screen session, tmux window, or other runtime process is still alive. - If the runtime reports the process is gone, calls
markOffline().
markOffline() does the following in order:
- Releases the instance’s allocated port back to the
PortAllocator. - Subtracts the instance’s
allocatedRamMBandallocatedCpufrom the node’s resource tracking. - For non-static instances, deletes the working directory at
./running/<instance-id>/. - Updates the instance’s
statetoOFFLINEin the shared HazelcastIMap.
Static instances (those with
"static": true) have their working directory preserved when marked offline. The health monitor only deletes ./running/<instanceId>/ for non-static instances.Instance count enforcer
InstanceCountEnforcer runs only on the Master node. If isMasterNode is false in config.json, the service logs a message and exits without scheduling anything.
What the enforcement loop does:
- Reads all loaded configurations from the cluster state map.
- Skips configurations with
static: trueorminimumServiceCount≤ 0. - Counts instances for each configuration whose state is
ONLINEorCREATING. - If the count is below
minimumServiceCount, callsInstanceCreationService.createInstance()for each missing instance.
createInstance(): the Master evaluates all connected Wrapper members and picks the one with enough free RAM and CPU to satisfy the configuration’s ramMB and cpu requirements. If no node has sufficient headroom, the auto-spawn attempt fails and a warning is logged.
minimumServiceCount in ./configuration/<name>.json:
default instances are running or being created at all times.
Checking instance state
Useinstance info <id> to inspect the current state of any instance, including its last heartbeat timestamp and the PID reported by the runtime:
REST API: instance state and heartbeats
External processes (such as a Minecraft plugin running inside the instance) report their health by calling the state endpoint:state and the lastHeartbeat timestamp in the shared IMap. A rising lastHeartbeat value is the signal that the application inside the instance is healthy, not just that the OS process is alive.
List all instances (including their current states):
Cluster resilience when a Wrapper disconnects
When a Wrapper node loses its Hazelcast connection — due to a network partition, container restart, or host failure — the following happens:- The Hazelcast cluster detects the member departure and fires a
memberRemovedevent. ResilienceMembershipListeneron the Master immediately marks all instances that were running on the disconnected Wrapper asOFFLINEin the sharedIMap.- Node resource tracking for the disconnected member is cleared.
- The
InstanceInforecords are retained — they are not removed — so external services can see what was running.
Instance records are preserved after a Wrapper disconnects. The
InstanceCountEnforcer will detect that the OFFLINE instances no longer count toward minimumServiceCount and will automatically spawn replacements on available nodes.Resource-aware node selection
When the Master needs to place a new instance — either frominstance create or from the count enforcer — it evaluates every connected Hazelcast member:
- Each member tracks its consumed RAM (
usedRamMB) and CPU (usedCpu) in the cluster state map viaNodeResources. - The Master subtracts used resources from the node’s total capacity and selects the first member that can satisfy the configuration’s
ramMBandcpurequirements. - When an instance is marked
OFFLINEorSTOPPED, itsallocatedRamMBandallocatedCpuare returned to the node’s available pool viaClusterStateService.removeNodeResources().
Monitoring summary
Health monitor
Runs on every node. 5-second interval. Marks instances
OFFLINE when the runtime process exits and releases ports and resources.Count enforcer
Runs on Master only. 5-second interval. Auto-spawns instances when the active count falls below
minimumServiceCount.Instance states
Four states:
CREATING, ONLINE, OFFLINE, STOPPED. Inspect with instance info <id> or GET /api/instances.Heartbeat API
External processes signal health via
PUT /api/instances/{id}/state. The lastHeartbeat field is updated on each call.