The coding agent recipes cover the SWE-bench evaluation pattern: N agents working on N isolated repository checkouts in parallel, each runningDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/deeplethe/forkd/llms.txt
Use this file to discover all available pages before exploring further.
git clone, pip install, and pytest in their own microVM — no shared filesystem, no shared process state, no interference between workers. The recipes/coding-agent/ directory provides the parent snapshot, and recipes/coding-agent-fork/ demonstrates the BRANCH variant where a binary blob too large to fit in a prompt is distributed byte-identically across four grandchildren.
Two patterns
coding-agent/ — fork-per-task evaluation harness
Each child inherits a fully warmed dev environment from the parent snapshot: Python 3.12, git, gh (GitHub CLI), build-essential, make, ruff, black, mypy, pytest, and requests. Fork a child per task, clone the repo, install dependencies, run the test suite — all in an isolated KVM microVM.
coding-agent-fork/ — BRANCH distributes large binary state
This recipe answers the “but couldn’t you just parallel-prompt the LLM?” objection directly. A source agent builds a Python package, runs a failing test suite (populating __pycache__/), and writes a 50 MiB synthetic binary (vendored.bin) representing real agent-accumulated state: pip caches, downloaded weights, compiled extensions. The source is BRANCHed; three grandchildren each apply a different fix strategy.
The key properties the demo verifies:
- The 50 MiB
vendored.binis byte-identical across all four sandboxes (md5 verified) - The
__pycache__/directory is byte-identical at branch time — each child’s own__pycache__/diverges only after they re-import with their modified source - Each child’s
mathy/__init__.pyafter applying its strategy is different - Test outcomes diverge: two strategies fix the bug; one backfires
Build the snapshot
Build the parent rootfs
python:3.12 image with the full dev toolchain. Rootfs: ~1.8 GB. Allow ~5 minutes the first time.What’s in the snapshot
Every fork inherits the following dev tools, already installed and onPATH:
| Tool | Version | Use |
|---|---|---|
python3 + pip | 3.12 | Runtime and package installer |
git | system | Repository checkout |
gh | latest | GitHub CLI for API access |
build-essential + make | system | C extension compilation |
ruff | pinned | Fast Python linter |
black | pinned | Code formatter |
mypy | pinned | Static type checker |
pytest | pinned | Test runner |
requests | pinned | HTTP client |
The BRANCH pattern from coding-agent-fork/
The three fix strategies applied by grandchildren after the BRANCH:
| Strategy | What it does | Test outcome |
|---|---|---|
minimal | One-line sed to flip a - b → a + b | ✅ Tests pass |
rewrite | Full function rewrite with type-checks | ✅ Tests pass |
skip | Decorate tests with @unittest.expectedFailure | ❌ Failed — test_add_zero (0 - 0 == 0) unexpectedly passes, breaking the contract |
skip strategy backfires because test_add_zero happens to pass despite the bug — unittest flags this as an “unexpected success” and fails the suite. This is the kind of emergent behavior that branch-and-compare reveals and a simple parallel API call cannot.
Results from the real run (2026-05-19)
bench/pause-window/RESULTS-v0.3.md for the full curve and the v0.4 live-BRANCH path that cuts source pause to 56 ms p50.
Run the BRANCH demo
recipes/coding-agent-fork/results/<unix-ts>/:
| File | Contents |
|---|---|
summary.md | Per-agent state evidence + divergent code side-by-side |
summary.json | Machine-readable version of the summary |
branch.json | Daemon’s BRANCH response including pause_ms |
state-evidence.txt | Raw md5 hashes proving byte-identity |
{source,minimal,rewrite,skip}-init-py.txt | Each agent’s mathy/__init__.py after their strategy |
{source,minimal,rewrite,skip}-agent.log | Full per-agent shell log including unittest output |
Key takeaway: bytes can’t fit in a prompt
To run 3 fix attempts via API-only parallelism, each request would need to carry the entire/workspace directory — source files, binary cache, populated __pycache__, and the 50 MiB vendored.bin. That’s:
- Technically impossible above ~50 KiB on most LLM APIs
- Meaningless for binary blobs — the LLM doesn’t understand them
- Wasteful — 3× the bytes transferred, 3× the context tokens