Cloud Repositorio follows a centralized orchestrator pattern where a single Python process coordinates VM lifecycle and networking across a cluster of SSH-accessible worker nodes. All control-plane logic — scheduling, VLAN allocation, state persistence — runs on the orchestrator machine. Worker nodes are treated as dumb hypervisors that execute commands over SSH and report resource availability at startup.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/markitobonito/cloud_repositorio/llms.txt
Use this file to discover all available pages before exploring further.
Components
OrchestratorAPI
The top-level control plane for slice, VM, and link lifecycle. Accepts calls from the CLI and dispatches work to
DeploymentAPI, VMLauncher, and VLANManager. Implements round-robin scheduling to distribute VMs across available workers.DeploymentAPI
Handles per-VM provisioning: copies the base QCOW2 image to the target worker via SSH, creates the
VM model object with assigned interfaces and VNC port, and records the image path for later cleanup.VLANManager
Configures OVS on the network node (
10.0.10.3). Creates VLAN gateway ports, starts dnsmasq in a dedicated network namespace (ns-dhcp-vlanN) for DHCP, and installs MASQUERADE iptables rules for outbound internet access.VMLauncher
Builds and runs the
qemu-system-x86_64 command line on the target worker over SSH. Creates TAP interfaces, attaches them to the OVS bridge with the correct VLAN tag, and daemonizes the process. Returns the QEMU PID on success.RemoteExecutor
Thin wrapper around
subprocess that executes shell commands on remote hosts via SSH. execute_direct() uses -o StrictHostKeyChecking=no; the SIGINT cleanup in main.py uses -o BatchMode=yes. Used by every component that interacts with worker nodes or the network node.Database
Thread-safe YAML state store backed by
database.yaml. Holds users, slices, worker specs, and auto-incrementing ID counters. Writes are serialized with threading.RLock. The HealthMonitor flushes state to disk every 15 seconds.Request flow
The following steps describe what happens whenOrchestratorAPI.deploy_slice() is called.
Configure per-link VLANs
For each
Link in the slice, VLANManager.create_vlan_with_gateway() runs on the network node (10.0.10.3):- Adds an OVS internal port to
br-inttagged with the link’s VLAN ID. - Assigns a gateway IP derived from the VLAN ID (e.g., VLAN 100 →
192.168.100.1/24). - Creates a network namespace named
ns-dhcp-vlanN. - Starts
dnsmasqinside the namespace to serve DHCP leases to VMs on that VLAN.
Configure internet access (VLAN 400)
If any VM in the slice has internet access enabled,
VLANManager configures VLAN 400 on the network node:- Creates an OVS gateway port for
10.60.7.0/24with gateway10.60.7.1. - Starts a dnsmasq DHCP namespace for VLAN 400.
- Installs an
iptables MASQUERADErule so that traffic from10.60.7.0/24is SNATed through the network node’s uplink interface.
eth0 management interface.Launch VMs
For each VM in the slice,
VMLauncher.launch_vm() runs on the assigned worker node:- Creates one TAP interface per VM network interface.
- Attaches each TAP to the OVS bridge
br-intwith the interface’s VLAN tag usingovs-vsctl set port. - Runs
qemu-system-x86_64with the provisioned QCOW2 image, KVM acceleration, the configured RAM and CPU count, and a VNC server bound to0.0.0.0:<vnc_port>. - The QEMU process is daemonized (
-daemonize) so it survives the SSH session.
Update database state
After each VM starts successfully,
deploy_slice() updates the in-memory database:- Sets
vm["status"] = "running"andvm["pid"] = <qemu_pid>for each VM. - Sets
slice_data["status"] = "running"for the slice. - Calls
db.update_slice(), which acquires thethreading.RLockand writes the updated state. TheHealthMonitorwill persist it to disk within the next 15 seconds.
State model
All persistent state lives in a single YAML file,database.yaml, with the following top-level keys:
| Key | Description |
|---|---|
users | Map of username → user record (password hash, quota, slice list). |
workers | Map of worker IP → resource specs (cores, RAM, disk, used amounts). |
workers_list | Ordered list of worker IPs used for round-robin scheduling. |
slices | Map of slice ID → full slice record (VMs, links, VLAN pool, status). |
next_vm_id | Auto-incrementing integer used for both VM IDs and slice IDs (starts at 1000). |
next_vlan_id | Auto-incrementing integer for globally unique VLAN allocation (starts at 100). |
threading.RLock to prevent concurrent modification from the HealthMonitor background thread and the CLI foreground thread. On startup, main.py copies database.yaml to database.yaml.backup before any writes occur. The HealthMonitor saves the in-memory state back to disk every 15 seconds while the orchestrator is running.
Worker topology
The default cluster consists of three nodes:| Node | Role |
|---|---|
10.0.10.1 | Compute (VM hosting) |
10.0.10.2 | Compute (VM hosting) |
10.0.10.3 | Compute + network node (OVS) |
WorkerDiscovery.discover_all() connects to each node over SSH and runs three commands to populate resource specs:
database.yaml under the workers key and used for capacity tracking. If a node is unreachable, WorkerDiscovery logs an error and skips that node; default values (2 cores, 1 GB RAM, 500 GB disk) are used as fallback.
VMs are distributed across all nodes in workers_list using round-robin. The same list includes 10.0.10.3, so the network node may also host VMs. The round_robin_idx counter is kept in memory on OrchestratorAPI and is not persisted between restarts.