Deploy forkd-controller on Kubernetes with KVM Nodes

forkd’s execution model maps cleanly onto Kubernetes, but with a different shape than runtimes that schedule one Pod per sandbox. A single forkd-controller Pod hosts the daemon and all of its child Firecracker processes. The K8s scheduler runs exactly once when the Pod starts, regardless of how many sandboxes you subsequently fork — no per-sandbox scheduling overhead, no per-sandbox Pod churn. This makes forkd better suited to AI agent fan-out workloads than Kata Containers or generic Firecracker-on-K8s designs that require one Pod per VM. The starter manifest lives at packaging/k8s/forkd-controller.yaml and has been verified end-to-end on k3s on bare-metal Ubuntu 24.04 / Linux 6.14 / KVM.

Node requirements

forkd requires KVM hardware virtualisation. Standard managed Kubernetes nodes on GKE, EKS, and AKS do not expose /dev/kvm unless you select a metal SKU or explicitly enable nested virtualisation for your node pool.

Each node that will run a forkd-controller Pod must have:

/dev/kvm present and accessible (VMX or SVM enabled in BIOS / hypervisor settings)
cgroup v2 unified hierarchy (mount -t cgroup2 cgroup2 /sys/fs/cgroup — the daemon writes to /sys/fs/cgroup/forkd/)
x86_64 architecture (Firecracker is x86_64-only)
A kernel image and parent rootfs reachable on the node, either placed directly on the node filesystem or mounted via a PersistentVolume

The starter manifest’s nodeSelector targets kubernetes.io/arch: amd64, which is set automatically on x86_64 nodes by most Kubernetes distributions. To additionally restrict scheduling to nodes with confirmed KVM access, apply a custom label and update the nodeSelector accordingly:

kubectl label node <node-name> feature.node.kubernetes.io/cpu-cpuid.VMX=true

Apply the starter manifest

Generate and patch the bearer token

The manifest ships with a placeholder token that the daemon refuses to accept at startup — a forgotten sed step fails noisily rather than silently. Replace it before applying:

TOKEN=$(head -c 32 /dev/urandom | base64)
sed -i "s|REPLACE_ME_WITH_32_BYTES_BASE64|$TOKEN|" packaging/k8s/forkd-controller.yaml

Apply the manifest

kubectl apply -f packaging/k8s/forkd-controller.yaml
kubectl -n forkd get pods -w

The Deployment uses strategy: Recreate because forkd holds live VM state and cannot do a rolling update.

Smoke-test from inside the cluster

kubectl -n forkd port-forward svc/forkd-controller 8889:8889
curl -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8889/healthz
# {"ok":true}
curl -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8889/v1/snapshots

Key manifest fields explained

The manifest creates four resources: a Namespace, a Secret (the bearer token), a Deployment, and a ClusterIP Service.

Secret

apiVersion: v1
kind: Secret
metadata:
  name: forkd-token
  namespace: forkd
type: Opaque
stringData:
  token: REPLACE_ME_WITH_32_BYTES_BASE64

The daemon mounts this at /etc/forkd/token (mode 0400). The value you substitute must be at least 16 bytes and must not begin with REPLACE_ME or CHANGE_ME — the daemon validates this at startup.

Container args

args:
  - --bind=0.0.0.0:8889
  - --state=/var/lib/forkd/state.json
  - --snapshot-root=/var/lib/forkd/snapshots
  - --audit-log=/var/lib/forkd/audit.log
  - --token-file=/etc/forkd/token

Inside the cluster the daemon binds on all interfaces (0.0.0.0) since the Service and any NetworkPolicy provide the isolation. The audit log is written into the same volume as the state file.

Volumes

Volume	Mount	Purpose
`kvm` (hostPath)	`/dev/kvm`	Exposes the KVM character device to Firecracker.
`cgroup` (hostPath)	`/sys/fs/cgroup`	Lets the daemon write per-child `memory.max` limits.
`token` (Secret)	`/etc/forkd`	Bearer token, mode `0400`.
`state` (emptyDir)	`/var/lib/forkd`	State file, snapshots, and audit log. Survives container restarts, not Pod restarts.

For production, replace the emptyDir state volume with a PersistentVolumeClaim so snapshots survive Pod restarts. Without a PVC, you must rebuild snapshots after every Pod restart.

Probes

readinessProbe:
  httpGet:
    path: /healthz
    port: api
  initialDelaySeconds: 3
  periodSeconds: 5
livenessProbe:
  httpGet:
    path: /healthz
    port: api
  initialDelaySeconds: 30
  periodSeconds: 10

/healthz is always unauthenticated, so probes work without a credential even when --token-file is set.

Sizing guidance

With a 512 MiB warmed Python + numpy parent snapshot, use these rough heuristics for resource requests and limits:

Resource	Guidance
vCPU — actively running agents	~1 actively-running agent per vCPU. Compute-bound bottleneck.
RAM — idle-pooled agents	~50 idle-pooled agents per 8 GiB Pod RAM. Process-state bottleneck, not memory.
CoW overhead per child	0.12 MiB at N=100 (bench data). Memory rarely caps fan-out — vCPU and process count dominate.

The starter manifest requests 4 CPU / 8 GiB and limits to 16 CPU / 32 GiB. Tune based on your parent snapshot size and expected concurrency. Heavier parents (browser, ML inference) hit vCPU ceilings sooner — benchmark with forkd bench --tag <your-tag> --n 20 inside the Pod before committing to node shape.

Security implications

The forkd-controller Pod runs privileged: true and runAsUser: 0. This is intentional and unavoidable — the daemon needs /dev/kvm access, cgroup v2 writes for per-child memory.max, and tap-device creation. The blast radius is node-level: a compromised Pod can escape to the underlying node.

The manifest uses the simplest secure path (privileged: true). For tighter security at the cost of additional platform work:

Replace privileged: true with a KVM device plugin (e.g. kubevirt/kvm-device-plugin) so /dev/kvm is granted as a K8s resource rather than a host mount.
Drop privileged and enumerate only the capabilities you need: NET_ADMIN (tap setup), SYS_ADMIN (cgroup writes).
Replace emptyDir with a PersistentVolumeClaim backed by fast local storage (NVMe hostPath or a CSI driver with local volumes).

Multi-tenant deployments

Treat the forkd-controller Pod’s bearer token like SSH-root on the node. Rotate it on any access change. Never share one token across tenants.

Because one forkd-controller Pod has node-level blast radius, multi-tenant deployments must isolate at the node boundary:

Run one forkd-controller Pod per tenant on dedicated nodes (use node labels + nodeSelector / nodeAffinity + taints + tolerations).
Each tenant gets their own Namespace, their own Secret, and their own bearer token.
Apply a NetworkPolicy limiting ingress to port 8889 to only that tenant’s agent backplane.
Do not co-schedule untrusted tenant pods on the same node as a forkd-controller.

Token management

The manifest ships with token: REPLACE_ME_WITH_32_BYTES_BASE64 in the Secret. The daemon’s validate_token() function rejects any token that:

Begins with REPLACE_ME or CHANGE_ME
Is shorter than 16 bytes

This means a kubectl apply with the un-patched placeholder causes the daemon to refuse to start — a noisy fail rather than a silent compromise. Always substitute the token before applying:

TOKEN=$(head -c 32 /dev/urandom | base64)
# Patch the YAML before apply, or use kubectl create secret:
kubectl -n forkd create secret generic forkd-token \
  --from-literal=token="$TOKEN" \
  --dry-run=client -o yaml | kubectl apply -f -

To rotate: update the Secret, then restart the Pod (kubectl -n forkd rollout restart deployment/forkd-controller). Existing sandboxes inside the Pod are killed on restart; snapshots on the PVC survive.

What this manifest does not cover

The starter manifest is a single-node starting point. The following are out of scope for v0.1 and noted here so you know what to add for production:

DaemonSet shape — for multi-node deployments (one controller per node), convert to a DaemonSet with nodeSelector for KVM-capable nodes.
netns provisioning — per-child netns (forkd-child-N) must be provisioned by scripts/netns-setup.sh on each node before forking. Wire this as a Pod init container or a separate DaemonSet.
HPA / autoscaling — each controller instance owns its state; horizontal scaling of the controller itself doesn’t apply. Scale by adding more KVM nodes and more controller instances (one per node).
NetworkPolicy — lock down port 8889 to your agent backplane only.

Get Started

Guides

Recipes

Operations

Deploy forkd-controller on Kubernetes with KVM Nodes

Node requirements

Apply the starter manifest

Key manifest fields explained

Secret

Container args

Volumes

Probes

Sizing guidance

Security implications

Multi-tenant deployments

Token management

What this manifest does not cover

Build docs developers (and LLMs) love

Get Started

Guides

Recipes

Operations

Documentation Index

​Node requirements

​Apply the starter manifest

​Key manifest fields explained

​Secret

​Container args

​Volumes

​Probes

​Sizing guidance

​Security implications

​Multi-tenant deployments

​Token management

​What this manifest does not cover

Build docs developers (and LLMs) love

Node requirements

Apply the starter manifest

Key manifest fields explained

Secret

Container args

Volumes

Probes

Sizing guidance

Security implications

Multi-tenant deployments

Token management

What this manifest does not cover