Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/deeplethe/forkd/llms.txt

Use this file to discover all available pages before exploring further.

The forkd Python SDK exposes two complementary surfaces. Controller is the host-side lifecycle client: it talks to the forkd-controller daemon’s REST API to create and delete snapshots, spawn sandboxes from them, branch a running sandbox into a new snapshot tag, and kill individual VMs. Sandbox is the in-guest execution client: it connects over TCP to the agent running inside one specific child VM so you can exec commands and eval expressions without shelling out to the CLI. Most agent runtimes use both — Controller to spawn, branch, and kill; Sandbox to drive code execution inside the VMs.

Installation

pip install forkd

Controller

Controller wraps the /v1/snapshots and /v1/sandboxes REST endpoints. All network I/O uses Python’s stdlib urllib; no extra dependencies are required.

Constructor

Controller(base_url=None, token=None, timeout=60.0)
base_url
str
default:"$FORKD_URL → http://127.0.0.1:8889"
Daemon base URL. Resolved in order: the base_url argument, the FORKD_URL environment variable, then the hardcoded default. Trailing slashes are stripped.
token
str
default:"$FORKD_TOKEN"
Bearer token sent as Authorization: Bearer <token>. Defaults to the FORKD_TOKEN environment variable. Required only when the daemon was started with --token-file.
timeout
float
default:"60.0"
Per-request timeout in seconds. Branching a large source VM can take 0.5–8 s for a full snapshot; the generous default accommodates that. Reduce it for latency-sensitive read-only calls if you want faster failure.

Snapshot methods

list_snapshots

list_snapshots() -> list[dict]
GET /v1/snapshots — returns every snapshot registered with the daemon.
returns
list[dict]
A list of SnapshotInfo dicts. Each dict contains tag, dir, created_at_unix, and optionally branched_from, pause_ms, and status.

delete_snapshot

delete_snapshot(tag: str) -> None
DELETE /v1/snapshots/:tag — removes the snapshot from the registry and deletes its on-disk files.
tag
str
required
The snapshot tag to delete.

Sandbox methods

spawn_sandboxes

spawn_sandboxes(
    snapshot_tag: str,
    n: int = 1,
    per_child_netns: bool = False,
    memory_limit_mib: int | None = None,
    prewarm: bool = False,
    live_fork: bool = False,
    hugepages: bool = False,
) -> list[dict]
POST /v1/sandboxes — fork n children from a registered snapshot tag.
snapshot_tag
str
required
Name of a registered snapshot to fork from. Must exist in list_snapshots().
n
int
default:"1"
Number of child sandboxes to spawn, 1–1000. All children share the parent snapshot’s memory image via copy-on-write.
per_child_netns
bool
default:"False"
When True, each child is placed in a dedicated network namespace named forkd-child-<i>. The host must have run scripts/netns-setup.sh N beforehand. Required for per-child network isolation.
memory_limit_mib
int
default:"None"
Sets the cgroup memory.max for each child in MiB. When None, no memory limit is applied.
prewarm
bool
default:"False"
v0.2.5+. When True, each child performs a throwaway snapshot to scratch storage immediately after restore. This faults in all guest pages upfront, trading ~170 ms / 512 MiB of extra spawn time for a predictable BRANCH latency on the first user-visible BRANCH — avoiding the 2–9× cold-cache penalty on first access. Useful when you have a BRANCH SLO and are fanning out 3 or more sandboxes from the same source.
live_fork
bool
default:"False"
v0.4+. Boots the sandbox with a memfd-backed RAM region so later BRANCHes from it can use mode="live" (UFFD_WP). Requires Linux kernel 5.7+ and the vendored Firecracker fork. No cost at spawn time beyond the backend swap; the overhead shows up on the first live BRANCH.
hugepages
bool
default:"False"
v0.4+. Backs the memfd with 2 MiB hugepages (MFD_HUGETLB | MFD_HUGE_2MB). Only meaningful when live_fork=True. Reduces TLB pressure during spawn-many and live BRANCH bulk-copy. Requires non-zero HugePages_Free in /proc/meminfoforkd doctor checks availability. Falls back to 4 KiB pages with a warning when the pool is exhausted.
returns
list[dict]
A list of SandboxInfo dicts. Each dict contains: id, snapshot_tag, netns, guest_addr, created_at_unix, pid, and optionally memory_limit_mib.

list_sandboxes

list_sandboxes() -> list[dict]
GET /v1/sandboxes — returns every live sandbox the daemon is tracking.
returns
list[dict]
A list of SandboxInfo dicts.

get_sandbox

get_sandbox(sandbox_id: str) -> dict
GET /v1/sandboxes/:id — fetch metadata for one sandbox.
sandbox_id
str
required
The sandbox id (e.g. "sb-67a1b3-0000").
returns
dict
A single SandboxInfo dict.

kill_sandbox

kill_sandbox(sandbox_id: str) -> None
DELETE /v1/sandboxes/:id — terminate one sandbox. Kills the Firecracker process and removes its cgroup leaf.
sandbox_id
str
required
The sandbox id to terminate.

branch_sandbox

branch_sandbox(
    sandbox_id: str,
    tag: str | None = None,
    diff: bool = False,
    measure_diff: bool = False,
    mode: BranchMode | None = None,
    wait: bool = True,
) -> dict
POST /v1/sandboxes/:id/branch — pause the source sandbox, write a snapshot, resume the source. The returned snapshot is independent of the source’s lifecycle; pass its tag back to spawn_sandboxes to fan out grandchildren that inherit the source’s exact state.
sandbox_id
str
required
Id of the sandbox to branch from.
tag
str
default:"None"
Optional tag for the new snapshot. When unset, the daemon generates branch-<sandbox-id>-<unix-ts>.
mode
BranchMode
default:"None"
v0.4+ canonical mode selector. One of "full", "diff", or "live". When set, takes precedence over the legacy diff boolean. Passing both mode and diff=True raises ControllerError (HTTP 400). Prefer mode in new code.
  • "full" — copy entire guest RAM under pause (0.5–8 s, default for v0.x).
  • "diff" — Firecracker Diff snapshot (v0.3+). ~200 ms source pause for idle sources; 6–15× speedup on typical agent workloads.
  • "live" — UFFD_WP live BRANCH (v0.4+). Sub-50 ms source pause; memory streams from the running parent. Source must have been spawned with live_fork=True.
diff
bool
default:"False"
Legacy. Equivalent to mode="diff". Kept so this SDK can drive v0.3.x daemons that don’t understand the mode field. Mutually exclusive with mode (server returns HTTP 400 if both are set).
measure_diff
bool
default:"False"
v0.3+. Measurement-only hook. Takes a Diff snapshot inside the existing Full pause to report what diff would have cost, without changing semantics. Mutually exclusive with diff (daemon returns 400 if both are True).
wait
bool
default:"True"
v0.4+, only meaningful with mode="live". Default True blocks until the background memory copy finishes and the returned snapshot has status="ready". Set to False for fire-and-forget: returns as soon as the source resumes (~10 ms) with status="writing". Poll list_snapshots() to detect when the snapshot becomes status="ready".
returns
dict
A SnapshotInfo dict. Contains tag, dir, created_at_unix, branched_from, and pause_ms. When mode="live", also includes status ("writing" or "ready").

exec_command

exec_command(sandbox_id: str, args: list[str], timeout_secs: int = 30) -> dict
POST /v1/sandboxes/:id/exec — run a subprocess inside the sandbox and return its output.
sandbox_id
str
required
The target sandbox id.
args
list[str]
required
Argv list. The first element is the executable path; subsequent elements are its arguments (e.g. ["python3", "-c", "print(2+2)"]).
timeout_secs
int
default:"30"
Maximum seconds to wait for the command to complete before the daemon kills it.
returns
dict
{"stdout": str, "stderr": str, "exit_code": int}

eval_code

eval_code(sandbox_id: str, code: str) -> dict
POST /v1/sandboxes/:id/eval — evaluate code against the sandbox’s warmed PID-1 process. Bypasses subprocess overhead: the parent VM’s runtime is already warm, so eval returns in single-digit milliseconds instead of ~100 ms for a fresh python3 -c "..." subprocess.
sandbox_id
str
required
The target sandbox id.
code
str
required
Python expression to evaluate against the warmed interpreter. Imports such as numpy are already in scope if the parent snapshot pre-imported them.
returns
dict
{"result": str | None, "error": str | None, "exit_code": int}

ping_sandbox

ping_sandbox(sandbox_id: str) -> dict
POST /v1/sandboxes/:id/ping — round-trip health check to the in-guest agent.
sandbox_id
str
required
The target sandbox id.
returns
dict
Dict with at minimum "pong" and "numpy_version". Exact shape is recipe-specific.

Sandbox

Sandbox provides an E2B-compatible in-guest execution API. It connects directly to the TCP agent running inside one specific child VM (default 10.42.0.2:8888) and sends JSON messages to exec commands or eval expressions.

Constructor

Sandbox(
    tag: str | None = None,
    target: str | None = None,
    timeout: int = 30,
    *,
    spawn: bool = True,
)
tag
str
default:"$FORKD_TAG → pyagent"
Snapshot tag to fork from. Resolved from FORKD_TAG env var, then "pyagent". Only used when spawn=True.
target
str
default:"$FORKD_TARGET → 10.42.0.2:8888"
host:port of the in-guest TCP agent. Resolved from FORKD_TARGET, then the default. When driving a Controller-spawned sandbox, pass the guest_addr field from SandboxInfo.
timeout
int
default:"30"
Seconds to wait for a response from the in-guest agent per call.
spawn
bool
default:"True"
When True, invokes the forkd CLI to fork a new child at construction time. Set to False when attaching to an already-running sandbox (e.g. one spawned via Controller.spawn_sandboxes).

Context manager

with Sandbox() as sb:
    print(sb.commands.run("echo hi").stdout)  # hi
__exit__ calls kill(). Use this pattern to guarantee cleanup even on exception.

Sandbox.create

Sandbox.create(*args, **kwargs) -> Sandbox
Class method alias for Sandbox(...). Mirrors E2B’s Sandbox.create() style for drop-in compatibility.

commands.run

sb.commands.run(cmd: str | list[str], timeout: int = 30) -> CommandResult
Run a command inside the sandbox. A string cmd is executed via sh -c; a list is executed directly as argv.
cmd
str | list[str]
required
Command to run. Pass a string for shell expansion or a list for direct argv execution.
timeout
int
default:"30"
Seconds before the in-guest agent kills the subprocess.
returns
CommandResult
A CommandResult dataclass with fields stdout: str, stderr: str, and exit_code: int. Mirrors E2B’s CommandResult shape.

eval

sb.eval(code: str) -> object
Evaluate code against the warmed PID-1 process. This is forkd’s key performance advantage: the parent VM’s runtime is already warm, so eval returns in single-digit milliseconds (vs ~100 ms for a fresh subprocess spawned by commands.run). Semantics depend on the recipe:
  • Python recipes (default): code is a Python expression evaluated in the agent’s interpreter. numpy is in scope when the image has it installed. Returns the repr() of the evaluated value as a string.
  • Node recipes (FORKD_AGENT_LANG=node, e.g. playwright-browser): code is an async-function body run with (browser, context, page) in scope. Top-level await is supported; use return to send a value back. Returns the JSON-decoded result as a native Python object.
code
str
required
Expression or statement body to evaluate.
returns
object
The repr() string for Python recipes, or a JSON-decoded native Python value for Node recipes. Raises RuntimeError on eval error.

ping

sb.ping() -> dict
Probe the in-guest agent. Returns a dict with at least "pong" and "numpy_version".

kill

sb.kill() -> None
Terminate the underlying Firecracker process. Idempotent. Called automatically on __exit__.

Types

BranchMode

BranchMode = Literal["full", "diff", "live"]
Canonical BRANCH mode selector (v0.4+):
ValuePause windowNotes
"full"0.5–8 sCopies entire guest RAM. Default for v0.x.
"diff"~200 ms idleFirecracker Diff snapshot (v0.3+). 6–15× faster on typical agent workloads, 143× ceiling on 4 GiB SSD.
"live"sub-50 msUFFD_WP live BRANCH (v0.4+). Source must be spawned with live_fork=True. Combine with wait=False to return after ~10 ms.

CommandResult

@dataclass
class CommandResult:
    stdout: str
    stderr: str
    exit_code: int
Return type of Sandbox.commands.run(). Mirrors E2B’s CommandResult API.

Exception: ControllerError

Raised on any non-2xx response from the forkd-controller daemon.
class ControllerError(RuntimeError):
    status: int    # HTTP status code
    body: Any      # Parsed JSON body, or raw string if not JSON
    url: str       # Full URL of the failed request
Use status to distinguish errors:
StatusMeaning
404Sandbox or snapshot not found
409Tag collision (snapshot with that name already exists)
400Bad request (e.g. mode and diff both set)
500Internal daemon error
from forkd import Controller, ControllerError

c = Controller()
try:
    c.kill_sandbox("sb-missing")
except ControllerError as e:
    if e.status == 404:
        print("sandbox not found")
    else:
        raise

Full example

This example uses both SDK surfaces together. Controller handles VM lifecycle; Sandbox drives in-guest execution. It then demonstrates a live BRANCH with fire-and-forget to fan out grandchildren.
import time
from forkd import Controller, Sandbox

# ── Sandbox: E2B-compatible in-guest execution ───────────────────────
print("=== with-block (auto-kill on exit) ===")
with Sandbox() as sb:
    print("agent:", sb.ping())

    t = time.perf_counter()
    r = sb.commands.run("python3 -c 'import numpy; print(numpy.eye(3))'")
    print(f"exec (fresh subprocess) [{(time.perf_counter()-t)*1000:.0f} ms]")
    print(r.stdout.rstrip())

    t = time.perf_counter()
    r = sb.commands.run("echo hello from sandbox")
    print(f"exec echo [{(time.perf_counter()-t)*1000:.0f} ms]: {r.stdout.rstrip()}")

    t = time.perf_counter()
    result = sb.eval("numpy.zeros(5).tolist()")
    print(f"eval (warm PID-1 numpy) [{(time.perf_counter()-t)*1000:.0f} ms]: {result}")

print()

# ── Controller: lifecycle + live BRANCH fan-out ──────────────────────
c = Controller()  # reads FORKD_URL, FORKD_TOKEN from env

# Spawn a live-fork-capable parent. live_fork=True backs guest RAM with
# a memfd so later BRANCHes can use mode="live" (UFFD_WP).
children = c.spawn_sandboxes("pyagent", n=1, per_child_netns=True, live_fork=True)
sb_id = children[0]["id"]
print(f"spawned {sb_id} at {children[0]['guest_addr']}")

# Drive the sandbox via Sandbox (attach, don't spawn a new VM)
with Sandbox(target=children[0]["guest_addr"], spawn=False) as sb:
    print(sb.commands.run("uname -a").stdout.rstrip())

# BRANCH: mode="live" drops source pause to sub-50 ms.
# wait=False returns after ~10 ms; background copy completes asynchronously.
# Poll c.list_snapshots() until status="ready" before spawning grandchildren.
branch = c.branch_sandbox(sb_id, tag="checkpoint-1", mode="live", wait=False)
print(f"branch tag={branch['tag']} status={branch.get('status')}")

# Fan out 5 grandchildren from the branch.
grandchildren = c.spawn_sandboxes(branch["tag"], n=5, per_child_netns=True)
print(f"spawned {len(grandchildren)} grandchildren")

# Cleanup
c.kill_sandbox(sb_id)
for gc in grandchildren:
    c.kill_sandbox(gc["id"])

Build docs developers (and LLMs) love