Set up worker nodes and the dedicated network node

Cloud Repositorio discovers and registers worker nodes at startup. The list of workers is read from workers_list in database.yaml and each host is SSH-accessed to collect hardware specs before the CLI becomes available. Workers that cannot be reached are skipped silently; only reachable workers are registered in the workers section of the database.

workers_list

workers_list is the authoritative list of compute worker IP addresses:

workers_list: ["10.0.10.1", "10.0.10.2", "10.0.10.3"]

At startup, WorkerDiscovery.discover_all() iterates over this list and attempts SSH access to each IP. Successful connections populate the workers dict with the collected hardware specs. Adding an IP to workers_list causes WorkerDiscovery to attempt SSH on the next startup.

SSH requirements

WorkerDiscovery connects as the ubuntu user (the default value of RemoteExecutor’s remote_user parameter). SSH must work without a password prompt — key-based authentication is required. The three commands executed during discovery are:

Metric	Command
CPU cores	`nproc`
RAM (GB)	`free -g \| grep Mem \| awk '{print $2}'`
Disk (GB)	`df /tmp -B G \| tail -1 \| awk '{print $2}' \| tr -d G`

If a command fails or the connection times out, the fallback values used are 2 cores, 1 GB RAM, and 500 GB disk.

Unreachable workers are silently skipped during discovery. No error is raised and the manager starts normally. Check the application logs to identify workers that failed discovery.

Required software on workers

Each compute worker (e.g., 10.0.10.1, 10.0.10.2) must have the following installed and configured:

qemu-system-x86_64 — QEMU/KVM hypervisor for running virtual machines
ovs-vsctl and Open vSwitch — with a br-int bridge already created (ovs-vsctl add-br br-int)
ip tuntap support — for creating TAP interfaces attached to OVS ports
Base VM images at /tmp/vm_images/ — cirros-0.6.2-x86_64-disk.img and/or focal-server-cloudimg-amd64.img (images are SCP-copied automatically if missing, but having them pre-staged speeds up first deployment)

Network node requirements (10.0.10.3)

The designated network node (10.0.10.3 by default) runs DHCP and NAT services and requires additional software:

dnsmasq — for per-VLAN DHCP namespaces
iptables with MASQUERADE support — for internet access via NAT
ip netns support — for isolated DHCP network namespaces per VLAN
Open vSwitch with a br-int bridge — same as compute workers

Removing a worker

To remove a worker, delete its IP from workers_list in database.yaml and restart the manager. The workers dict entry for that IP will no longer be refreshed. Existing VMs assigned to the removed worker will remain in the database but cannot be started, stopped, or deleted until the worker is reachable again.

Cloud Repositorio does not support live migration. Removing a worker that has running VMs leaves those VMs orphaned in the database — they appear in slice listings but cannot be controlled. Clean up VMs before removing a worker from the list.

Get Started

Core Concepts

Operations

Configuration

Set up worker nodes and the dedicated network node

workers_list

SSH requirements

Required software on workers

Network node requirements (10.0.10.3)

Removing a worker

Build docs developers (and LLMs) love

Get Started

Core Concepts

Operations

Configuration

Documentation Index

​workers_list

​SSH requirements

​Required software on workers

​Network node requirements (10.0.10.3)

​Removing a worker

Build docs developers (and LLMs) love

workers_list

SSH requirements

Required software on workers

Network node requirements (10.0.10.3)

Removing a worker