Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/opensandbox-group/OpenSandbox/llms.txt

Use this file to discover all available pages before exploring further.

OpenSandbox supports pausing and resuming Kubernetes-backed sandboxes without losing filesystem state. When you pause a sandbox, the controller commits its root filesystem as an OCI image to a configured registry, then releases the underlying cluster resources (Pods and pooled allocations). When you resume, the same sandbox ID is reused — the controller rewrites the pod template to the latest snapshot image and recreates the runtime. This lets you free cluster capacity between agent tasks while retaining the exact filesystem state produced by prior work.
Pause and resume is a Kubernetes-only feature. It requires the OpenSandbox controller to be deployed and a reachable OCI registry for storing snapshots. The feature is not available in Docker (single-host) mode.

Overview

Behavior
PauseCreates an internal SandboxSnapshot, commits the running container root filesystem as an OCI image, then quiesces the sandbox runtime and releases Pods and pooled allocations
ResumeReuses the same BatchSandbox, rewrites its template to the latest snapshot image, and recreates the runtime from that image
Sandbox IDStable across pause/resume cycles — callers use the same ID throughout the sandbox lifetime
Replica supportCurrently limited to BatchSandbox.spec.replicas=1

Sandbox Lifecycle States

The sandbox transitions through both stable and intermediate states during pause and resume:
StateTypeDescription
RunningStableSandbox is active and processing requests
PausingIntermediatePause in progress — snapshot commit is coordinated through an internal SandboxSnapshot resource
PausedStableSandbox is paused, the latest rootfs snapshot is ready, and runtime Pods and pooled allocations have been released
ResumingIntermediateResume in progress — the controller is rewriting the sandbox template to the latest snapshot image and recreating the runtime
FailedStableOperation failed — check reason and message for details

SandboxSnapshot Internal States

For detailed progress tracking during a pause, inspect the internal SandboxSnapshot resource:
PhaseDescription
PendingSnapshot request accepted; waiting to resolve source Pod or create commit Job
CommittingCommit Job is running and pushing snapshot images to the registry
SucceedSnapshot is ready and will be used for the next resume
FailedSnapshot creation failed

What Is Preserved

Preserved?
Root filesystem contents✅ Yes — committed as OCI image
Environment variables✅ Yes — from BatchSandbox template
Running processes / memory❌ No — process state is not checkpointed
Explicit volume mountsDepends on volume type

Key Design Principle

Controller-level configuration — registry URL and push/pull secrets are configured on the Kubernetes controller manager, not in ~/.sandbox.toml. SDK users and API callers require no code changes to use pause and resume. They simply call pause() and resume() on the existing sandbox ID.
Pause and resume is currently limited to BatchSandbox.spec.replicas=1. Server-created Kubernetes sandboxes use replicas: 1 by default. If you create BatchSandbox CRs directly with a different replica count, the controller will reject the pause request.

Prerequisites

Before using pause and resume, ensure the following are in place:

Kubernetes Runtime

Your OpenSandbox server must be running in Kubernetes mode with the controller deployed to the cluster.

OCI Registry

An OCI-compatible registry (Docker Hub, GHCR, Harbor, or a private registry:2 instance) must be accessible from cluster nodes for push and from the kubelet for pull on resume.

Registry Secrets

Kubernetes Secrets of type kubernetes.io/dockerconfigjson must exist in the sandbox namespace for both push (commit Job) and pull (resumed Pod).

Controller Configured

The controller manager must be started with --snapshot-registry, --snapshot-push-secret, and --resume-pull-secret flags pointing to your registry.

Controller Configuration Reference

Configure the controller manager deployment with snapshot flags:
- --snapshot-registry=registry.example.com/sandboxes
- --snapshot-registry-insecure=false
- --snapshot-push-secret=registry-snapshot-push-secret
- --resume-pull-secret=registry-pull-secret
FlagDefaultDescription
--snapshot-registry""Required. OCI registry prefix. Images are stored as <registry>/<sandboxName>-<container>:snap-gen<N>.
--snapshot-registry-insecurefalseEnables insecure registry mode for snapshot push. Use only for HTTP or self-signed local registries.
--snapshot-push-secret""Kubernetes Secret name for pushing snapshots. Must be kubernetes.io/dockerconfigjson type.
--resume-pull-secret""Kubernetes Secret name injected into resumed sandboxes for pulling snapshot images.
--image-committer-image"image-committer:dev"Image used by commit Jobs.
--commit-job-timeout"10m"Timeout for commit Jobs.

Helm Chart Values

The opensandbox-controller Helm chart exposes the snapshot-related controller values directly:
  • controller.snapshot.imageCommitterImage
  • controller.snapshot.commitJobTimeout
  • controller.snapshot.registry
  • controller.snapshot.registryInsecure
  • controller.snapshot.snapshotPushSecret
  • controller.snapshot.resumePullSecret
For the all-in-one opensandbox chart, use the same values under the opensandbox-controller.* prefix.

Usage

Once the controller manager is configured and the server is running, pause and resume work through the standard Lifecycle API with no SDK changes required.
1

Create a sandbox normally

Create a sandbox using the standard API. No special parameters are needed to enable pause/resume support.
import asyncio
from opensandbox import Sandbox

async def main():
    sandbox = await Sandbox.create(
        image="opensandbox/code-interpreter:latest",
    )
    print(f"Sandbox ID: {sandbox.id}")

asyncio.run(main())
2

Pause the sandbox

Call pause() to commit the root filesystem as an OCI snapshot and release cluster resources. The call returns when the sandbox reaches the Paused state.
await sandbox.pause()
print("Sandbox is now paused — cluster resources released")
3

Resume from snapshot

Call resume() using the same sandbox ID. The controller rewrites the pod template to the latest snapshot image and recreates the runtime. The returned object has the same sandbox ID.
resumed = await sandbox.resume()
print(f"Resumed sandbox ID: {resumed.id}")  # Same ID as before
4

Use the sandbox normally

After resume() returns, the sandbox is in Running state with the same filesystem state from before the pause. Running processes and in-memory state are not restored.
result = await resumed.commands.run("ls /workspace")
print(result.logs.stdout)

Full SDK Examples

import asyncio
from opensandbox import Sandbox

async def main():
    # Create
    sandbox = await Sandbox.create(
        image="opensandbox/code-interpreter:latest",
    )
    sandbox_id = sandbox.id

    # Do some work
    await sandbox.commands.run("echo 'hello' > /workspace/output.txt")

    # Pause — releases cluster resources
    await sandbox.pause()
    print(f"Sandbox {sandbox_id} paused")

    # ... time passes, cluster resources are freed ...

    # Resume — restores filesystem from OCI snapshot
    resumed = await sandbox.resume()
    print(f"Sandbox {sandbox_id} resumed")

    # Filesystem state is intact
    result = await resumed.commands.run("cat /workspace/output.txt")
    print(result.logs.stdout)  # "hello"

    await resumed.kill()

asyncio.run(main())

Multiple Pause/Resume Cycles

Pause and resume can be repeated. Each pause cycle produces a new snapshot image tag (snap-gen1, snap-gen2, and so on). The controller always uses the latest snapshot for the next resume. This means you can safely run a long-lived agent workflow across many pause/resume cycles, accumulating filesystem changes across each run.

Build docs developers (and LLMs) love