A Ployz cluster is a flat mesh of peer machines. There is no master node. No machine holds state that others lack, and no machine’s removal breaks coordination. You can remove any node — including the one you are currently connected to — without a quorum ceremony or a controller migration. This peer model is what makesDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/getployz/ployz/llms.txt
Use this file to discover all available pages before exploring further.
machine remove safe by construction.
Every node is a peer
When a machine joins the cluster it receives:- A WireGuard identity (public key and overlay IPv6 address).
- A NATS leaf node connection to the cluster’s control-plane store.
- A subnet for workload container networking.
- A machine ID, region, and optional availability zone.
The
storage flag on a machine controls whether it participates in NATS JetStream quorum. Nodes with storage=false still join the mesh and run workloads — they just do not hold durable control-plane state. See Storage for ZFS implications.NATS as the control-plane substrate
NATS is not a message bus bolted on for convenience. It is the native substrate for everything the control plane needs to do:- Durable facts. Deploy commits, machine membership records, routing events, and instance status are stored in NATS JetStream streams and KV buckets. These survive daemon restarts.
- Coordination. Deploy leases, participant locks, and quorum decisions happen through NATS. A command that cannot acquire a lock fails loudly; it does not queue or retry silently.
- Request/reply commands. Small participant actions — start a container, probe readiness, confirm a volume transfer — use NATS request/reply on per-machine subjects. No responder or timeout is an immediate foreground failure.
- Ordered routing events. The gateway and DNS service consume a NATS stream of routing events and rebuild their view from it. If freshness becomes uncertain, they reload rather than serving stale projections.
Three kinds of truth
Ployz separates state into three categories that are never mixed:| Kind | What it represents | Examples |
|---|---|---|
| Intent | What an operator explicitly asked the cluster to do | Deploy commits, machine membership records, instance status, routing events |
| Status | Durable lifecycle facts emitted by operations | Deploy phase records, volume movement evidence, branch lineage |
| Observation | Live reachability, health, and capacity checked at decision time | Placement probes, WireGuard handshake state, participant readiness |
Namespaces and machine membership
Workloads are grouped into namespaces. A namespace is the unit of deploy authority: one owning authority accepts durable deploy writes for a namespace, and routing events belong to that authority. Machines are members of the cluster itself, not of any particular namespace. A single machine can run workloads from multiple namespaces. Placement decisions — which machines receive which workloads — happen at deploy time, based on live machine capacity and region role.Region roles and topology
Every machine has a region and an optional availability zone. These are operator-assigned topology labels used to guide placement decisions. Regions have one of four roles:| Role | Placement behavior |
|---|---|
home_data | Receives new placements; preferred for stateful workloads |
compute | Receives new placements; preferred for stateless workloads |
draining | No new placements; existing workloads drain off |
disabled | No new placements; excluded from all placement decisions |
Scale target: 1–200 nodes
Ployz targets clusters in the 1–200 node range. This is not an arbitrary limit — it is the range in which an operator can understand the whole system, explain every workload’s placement, and reason about a migration or branch operation end-to-end.Single developer machine
ployzctl dev runs the full cluster model locally. All primitives — branch, migrate, rollback — work identically to a multi-node cluster.Small office or bare-metal fleet
Up to 200 nodes joined into one WireGuard mesh. One model, one set of primitives, no operational bifurcation between “dev” and “production”.