Branch a LangGraph ReAct agent mid-thought with forkd

The langgraph-react recipe is the canonical forkd “fork a thinking agent” demo. A ReAct agent runs a trip-planning task, builds up reasoning state across multiple steps — tool calls, conversation history, partial answers — and then gets BRANCHed while it’s still mid-thought. Three grandchildren are spawned from the branch, each injected with a different steering hint ("be thorough", "be minimal", "optimize for cost"). All three inherit the parent’s prior reasoning state identically; only the next LLM call diverges. The result: three independent itineraries that differ in ways the model was never explicitly told to produce.

Prerequisites

forkd installed and forkd doctor passing all checks
The langgraph parent snapshot built or pulled (see below)
An LLM API key — the demo defaults to SiliconFlow (OpenAI-compatible, hosts DeepSeek-V3 / Qwen). Set SILICONFLOW_API_KEY, or point LLM_BASE_URL + LLM_API_KEY at any OpenAI-compatible endpoint.

Build the snapshot

Build the parent rootfs

cd recipes/langgraph-react
sudo SILICONFLOW_API_KEY=$SILICONFLOW_API_KEY bash build.sh

This builds a python:3.12-slim image with langgraph, langchain-openai, and requests installed. Allow ~5 minutes the first time — the pip wheels are heavy.

curl -fsS -H "Authorization: Bearer $FORKD_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"tag":"langgraph","kernel":"/path/to/vmlinux","rootfs":"/path/to/recipes/langgraph-react/parent.ext4","rw":true,"tap":"forkd-tap0","boot_wait_secs":20}' \
  $FORKD_URL/v1/snapshots

Alternatively, pull the pre-built snapshot from the Hub (skips the build entirely):

forkd pull deeplethe/langgraph-react

Run the demo

export SILICONFLOW_API_KEY=...
export FORKD_URL=http://127.0.0.1:8889
export FORKD_TOKEN=$(cat /etc/forkd/token)
bash recipes/langgraph-react/demo.sh

The demo writes all artifacts to results/<timestamp>/ in the current directory.

What happens

Source agent runs a ReAct loop

The agent receives the task: “Plan a 2-day trip to Kyoto and Osaka. Use the tools to check weather and find places.” It runs weather and search_places tool calls across multiple steps, building up a conversation history and partial answer.

Agent emits READY_TO_BRANCH and pauses

After the configured number of steps (--branch-after-step=3 by default), the agent emits a {"event":"ready_to_branch"} JSONL marker and sleeps for --branch-wait-s seconds (default 30). This is the window the orchestrator uses.

Orchestrator BRANCHes and spawns three grandchildren

demo.sh polls for the READY_TO_BRANCH marker, then calls POST /v1/sandboxes/:id/branch on the daemon. Three grandchildren are spawned from the resulting snapshot. The orchestrator writes a different hint file into each child’s /tmp/forkd-hint.txt via exec.

Each grandchild reads its hint and completes the loop

When the agent wakes from its branch_wait_s sleep (whether in the source or a grandchild), it reads /tmp/forkd-hint.txt before the next LLM call. The hint is prepended to the conversation as a system-level steering message. The three grandchildren make different choices from the same prior state.

Results

The full run summary from results-2026-05-18/ shows clear itinerary divergence between all four agents:

Agent	Hint	Day-1 afternoon (Kyoto)	Notable framing
parent	(none — control)	Nishiki Market ($$)	Baseline; no special framing
thorough	`"cultural depth, slow"`	Arashiyama Bamboo Grove (free)	Replaced shopping with cultural-nature activity
minimal	`"daylight outside, no shopping"`	Arashiyama Bamboo Grove (free)	Replaced shopping with outdoor activity
cost	`"avoid $$$, prefer free or $"`	Arashiyama Bamboo Grove (free)	Added `"may be pricey"` warning labels; explicit cost-optimization footer

The model was never told to drop Nishiki Market or add Arashiyama. All three hinted children independently re-ranked based on their hint. The cost-focused child went further, annotating dining stops with budget warnings that the others didn’t include. Timing from the real run:

Metric	Value
Daemon-measured pause window	4 007 ms (SATA SSD) / 163 ms (tmpfs)
Memory image size	513 MiB
Grandchildren spawned	3
Network retries	0 (clean run)
Per-agent token cost	1 395–1 546 tokens

Key code: the hint side-channel in `agent.py`

The agent reads /tmp/forkd-hint.txt before every LLM call. If a hint is present, it is appended as a system message at the end of the conversation — recent steering wins over the earlier system prompt. The prior conversation history and tool results are not modified.

def read_hint() -> str:
    """Read /tmp/forkd-hint.txt. Empty string on any failure."""
    try:
        return HINT_FILE.read_text(encoding="utf-8", errors="replace").strip()
    except (FileNotFoundError, OSError):
        return ""


def run_step(
    *,
    step: int,
    messages: list,
    base_url: str,
    api_key: str,
    model: str,
    temperature: float,
) -> tuple[bool, int]:
    """One ReAct step. Returns (done, tokens_used)."""
    hint = read_hint()
    if hint:
        messages = messages + [
            {"role": "system", "content": f"Updated steering hint: {hint}"}
        ]
        emit({"event": "hint", "step": step, "hint": hint})

    resp = chat_completion(
        base_url=base_url,
        api_key=api_key,
        model=model,
        messages=messages,
        tools=TOOLS_SPEC,
        temperature=temperature,
    )
    # ... tool dispatch, answer detection, etc.

And the branch-point pause that gives the orchestrator time to act:

if step == args.branch_after_step:
    emit({"event": "ready_to_branch"})
    # Orchestrator uses this window to BRANCH + spawn grandchildren
    # + plant hints via `forkd-controller exec`.
    time.sleep(args.branch_wait_s)
    emit({"event": "resumed"})

Why BRANCH instead of calling the LLM three times in parallel?

When you call the LLM three times in parallel with different system prompts, each call starts from scratch — there is no shared prior state. The agent re-does its tool calls, re-spends tokens, and potentially reaches different intermediate conclusions before you inject the hint. With BRANCH, the three grandchildren inherit:

The conversation history built up by the parent (tool calls, tool results, prior reasoning)
The Python heap — loaded packages, the in-memory message list, any caches
The in-guest filesystem state — any files written to /tmp

The hint perturbation applies only to the next LLM call. Everything before the branch point is shared and identical. This is a strict superset of what parallel API calls can do, and it’s cheaper in tokens and wall-clock time.

See /guides/branching for a full explanation of BRANCH mode, the diff vs. live snapshot options, and how to tune the pause window for your workload.

Demo artifacts

After demo.sh completes, results/<timestamp>/ contains:

File	Contents
`source-parent-transcript.jsonl`	Source agent’s full step history (JSONL, one event per line)
`child-thorough-transcript.jsonl`	Thorough child’s history after the divergence
`child-minimal-transcript.jsonl`	Minimal child’s history after the divergence
`child-cost-transcript.jsonl`	Cost-focused child’s history after the divergence
`summary.md`	Auto-generated side-by-side comparison of all four final answers
`summary.json`	Machine-readable version of the summary
`branch.json`	Daemon’s BRANCH response including `pause_ms`

Get Started

Guides

Recipes

Operations

Branch a LangGraph ReAct agent mid-thought with forkd

Prerequisites

Build the snapshot

Run the demo

What happens

Results

Key code: the hint side-channel in `agent.py`

Why BRANCH instead of calling the LLM three times in parallel?

Demo artifacts

Build docs developers (and LLMs) love

Get Started

Guides

Recipes

Operations

Documentation Index

​Prerequisites

​Build the snapshot

​Run the demo

​What happens

​Results

​Key code: the hint side-channel in agent.py

​Why BRANCH instead of calling the LLM three times in parallel?

​Demo artifacts

Build docs developers (and LLMs) love

Prerequisites

Build the snapshot

Run the demo

What happens

Results

Key code: the hint side-channel in `agent.py`

Why BRANCH instead of calling the LLM three times in parallel?

Demo artifacts