forkd’s core primitive is fork-from-warm: boot a Firecracker microVM once, load it with your runtime (Python + dependencies, a JIT-warmed JVM, a pre-loaded ML model), pause it to disk as a snapshot, then restore N independent children from that snapshot. Each childDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/deeplethe/forkd/llms.txt
Use this file to discover all available pages before exploring further.
mmaps the parent’s memory.bin file with MAP_PRIVATE; the Linux kernel implements copy-on-write at the 4 KiB page level, so children share every unmodified resident page until they actually write to it. The result is KVM-grade hardware isolation at a per-child memory overhead of 0.12 MiB on a 512 MiB Python + numpy parent.
The warm-fork lifecycle
Boot
The parent Firecracker process is started with a
BootConfig specifying the guest kernel, rootfs, vCPU count, and memory size. Firecracker calls InstanceStart via its Unix-socket API, the guest kernel boots, and PID 1 (forkd-init.sh) mounts pseudo-filesystems, fixes DNS, and launches the guest agent (forkd-agent.py) on TCP port 8888.This boot happens once per snapshot tag — never per child.Warmup
After boot, user-space warm-up runs inside the VM. For a Python parent this means
import numpy, import torch, or whichever libraries your agent workload needs. Any work the parent does — JIT compilation, model weight loading, disk prefetch — lands in resident RAM pages that every future child will inherit without paying the cost again.Pause
forkd snapshot issues a PATCH /vm {"state": "Paused"} to the parent’s Firecracker socket. The guest vCPUs are halted; the VM is frozen in a deterministic state. The parent process keeps running but is no longer executing guest instructions.Snapshot to disk
Firecracker’s
PUT /snapshot/create writes two files to the snapshot directory:memory.bin— the full guest physical memory image, one contiguous file.vmstate— serialised vCPU register state, device state, and metadata.
forkd pull/forkd pack can ship the snapshot as a single .tar.zst file (typically 23× compression — a 512 MiB memory.bin becomes ~22 MiB on disk).Restore N children (fork-out)
Snapshot::restore_many_with spawns N Firecracker processes in parallel. Each one receives:PUT /snapshot/loadwithmem_backend.backend_type: "File"andMEMORY_LOAD_PRIVATE— Firecracker callsmmap(memory.bin, MAP_PRIVATE), notread(). The kernel maps the file pages into the child’s address space but does not copy them. All N children point at the same physical pages.- Placement into a dedicated cgroup v2 leaf (
/sys/fs/cgroup/forkd/child-N/) withmemory.maxset to the configured quota. - Assignment to a pre-provisioned network namespace (
forkd-child-N) with its own tap device, IP stack, and veth pair to the sharedforkd-br0bridge.
Copy-on-write memory model
After restore, each child has aMAP_PRIVATE mapping of memory.bin. The Linux page cache holds the physical pages backing that file. When child processes read a page, the kernel services the fault from the shared backing store — no copy, no additional memory. When a child writes to a page for the first time, the kernel:
- Allocates a new physical page for that child.
- Copies the original page contents into the new page (copy-on-write fault).
- Remaps the child’s virtual address to the new private page.
| Metric | Value |
|---|---|
| Host memory delta per child (N=100, 512 MiB Python+numpy parent) | 0.12 MiB |
| Firecracker process resident size before any guest state | ~5 MiB |
| Wall-clock to fork N=100 children | 101 ms |
The 0.12 MiB per-child overhead figure covers only the pages that diverged during the measurement workload (
import numpy; numpy.zeros(5).tolist()). Heavier agents will diverge more pages over time — the parent’s resident size sets the upper bound, not the lower bound. vCPU count and process count dominate capacity planning before memory does on typical workloads.memory.bin is written to ext4 by default. The host page cache backs hot pages; with hugepages provisioned (512 × 2 MiB pages, per scripts/setup-host.sh), the kernel can back hot regions with 2 MiB TLB entries, reducing page-table pressure at high N.
BRANCH: forking a live running VM
BRANCH is the inverse of the warmup snapshot — instead of snapshotting a freshly booted parent, you snapshot a running sandbox mid-execution, then resume the source and fork children from the new snapshot. An agent can branch mid-thought: three children each receive a different steering hint while inheriting the same prior reasoning state and filesystem. forkd offers three BRANCH modes with different pause-window tradeoffs:| Mode | Source pause window | Total BRANCH API time | Notes |
|---|---|---|---|
| Full | 29+ s (bandwidth-bound copy of full memory.bin) | 29+ s | Baseline; not recommended for running agents |
| Diff | ~150–205 ms (only dirty pages are diffed) | bandwidth-bound on background cp | Source pauses ~200 ms; background merge runs in parallel |
| Live (v0.4) | 56 ms p50 / 64 ms p90 | ~70 ms with wait: false | Source pauses sub-50 ms; background copy is disk-independent |
The Diff mode improvement over Full mode is 143× on a 4 GiB SSD source at idle (29.3 s → 205 ms). For a typical agent workload with 30–300 MiB of dirty pages, the reduction is 6–15×. v0.3.4 fixed a multi-BRANCH pause anomaly where repeated BRANCHes on the same parent ballooned to 2.7 s; a 30-line
posix_fallocate fix keeps consecutive BRANCHes flat at ~150 ms (17.6× faster on the 6th consecutive BRANCH).live_fork: true, which backs guest RAM with a memfd shared between the Firecracker process and the controller. When BRANCH is issued:
- The controller installs a UFFD_WP (userfaultfd write-protect) watch on the shared memfd — dirty pages will be captured out-of-band.
- vCPUs are halted and the
vmstateis dumped. The source’s pause window ends here (~56 ms p50). - The source vCPUs resume immediately.
- In the background, dirty pages tracked by UFFD_WP are copied into the new snapshot’s memory image. The copy runs asynchronously — disk I/O does not extend the source’s downtime.
wait: false, the BRANCH API call returns after ~10 ms (as soon as the source has resumed). Poll list_snapshots until status: "ready" to know when the background copy is complete and children can be forked from the new tag.
Live BRANCH requires the vendored Firecracker fork at
deeplethe/firecracker:forkd-v0.4-mem-backend-shared-v1.12. This is because mem_backend.backend_type: "Shared" with shared: true is the one gap that couldn’t be worked around without mmap MAP_SHARED in vanilla upstream Firecracker. An upstream proposal is open; once it lands, the vendor requirement goes away.Architecture components
forkd is composed of four cooperating components:forkd CLI
The forkd binary is the operator’s interface. Key verbs:
| Verb | What it does |
|---|---|
forkd quickstart | One-command preflight + snapshot + fork |
forkd doctor | 16-check host diagnostic with fix hints |
forkd snapshot | Boot a parent, warm it, pause, write memory.bin + vmstate |
forkd fork | Restore N children from a snapshot tag |
forkd bench | Measure spawn, exec, BRANCH, and fan-out latency against a tag |
forkd pack / forkd unpack | Bundle a snapshot (+ chain ancestors) into a single .tar.zst |
forkd push / forkd pull | Publish to or fetch from the Snapshot Hub |
forkd images | List local snapshots with sizes |
forkd snapshot-diff | Build a diff-snapshot layer on top of an existing tag (v0.5) |
forkd snapshot-compact | Flatten a deep diff chain into a single layer |
forkd-controller daemon
The controller is a long-running daemon that owns the authoritative state of all snapshots and live sandboxes. It exposes:
- REST API on
127.0.0.1:8889(or a configured address) —POST /v1/sandboxes,GET /v1/sandboxes,POST /v1/sandboxes/:id/branch,DELETE /v1/sandboxes/:id. - Bearer-token auth via a secret loaded from
/etc/forkd/token. Constant-time comparison guards against timing oracles. - Prometheus
/metrics—forkd_snapshots_total,forkd_sandboxes_active,forkd_build_info. - Append-only JSON audit log at
/var/log/forkd/audit.log— one JSON-Lines object per request with RFC3339 timestamp, method, path, status, latency in microseconds, and user-agent. - Graceful shutdown and reconciliation on restart (prunes sandbox entries whose Firecracker PID is gone from
/proc).
forkd-vmm library
The forkd-vmm crate is the Firecracker wrapper that all other components build on. It provides:
BootConfig— typed configuration for kernel, rootfs, vCPU count, memory, and network.Vm— lifecycle methods:boot,pause,snapshot_to,resume,kill.Snapshot—restore_many_with(n, opts)spawns N Firecracker processes in parallel, each with its ownMAP_PRIVATEmemory mapping.ForkOpts— controlsmemory_limit_mib,per_child_netns,live_fork, and related options.- cgroup helpers — creates and populates cgroup v2 leaves under
/sys/fs/cgroup/forkd/child-N/. - Network namespace plumbing —
setns(2)intoforkd-child-Nfor each child’s agent communication. - Raw HTTP/1.1 over Unix socket — typed wrappers around every Firecracker API endpoint without pulling in an async HTTP client.
Guest agent (forkd-agent.py)
forkd-agent.py is a minimal TCP server running on port 8888 inside every sandbox (parent and child alike). It handles three message types:
ping— health check; the controller uses this to confirm a child is alive after restore.exec— run a shell command, capture stdout/stderr/exit-code.eval— evaluate a Python expression in the already-running interpreter (PID 1’s namespace). This is the ~1 ms path vs ~96 ms for a cold subprocess.
forkd-init.sh (PID 1) mounts pseudo-filesystems, fixes /etc/resolv.conf, and launches forkd-agent.py before entering an idle wait loop.
System requirements
| Requirement | Minimum | Recommended | Notes |
|---|---|---|---|
| Kernel | Linux 5.7 | Linux 5.20+ | 5.20+ for automatic RNG re-seed via vmgenid |
| Architecture | x86_64 | x86_64 | aarch64 is tracked; not yet tested in CI |
| KVM | /dev/kvm present + writable | bare-metal | Nested virt works but adds overhead |
| cgroup | v2 unified hierarchy | v2 | mount -t cgroup2 must succeed |
| Firecracker | v1.7+ | v1.10.1 | v0.4 live-fork requires vendored fork |
| Network | iproute2, iptables | + bridge-utils | tap + veth + MASQUERADE rule on forkd-br0 |
| Per-child netns | scripts/netns-setup.sh N | — | One named netns per child, pre-provisioned |
uffd_wp | vm.unprivileged_userfaultfd=1 or CAP_SYS_PTRACE | — | Required for v0.4 live BRANCH only |