OpenSandbox supports pausing and resuming Kubernetes-backed sandboxes without losing filesystem state. When you pause a sandbox, the controller commits its root filesystem as an OCI image to a configured registry, then releases the underlying cluster resources (Pods and pooled allocations). When you resume, the same sandbox ID is reused — the controller rewrites the pod template to the latest snapshot image and recreates the runtime. This lets you free cluster capacity between agent tasks while retaining the exact filesystem state produced by prior work.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/opensandbox-group/OpenSandbox/llms.txt
Use this file to discover all available pages before exploring further.
Pause and resume is a Kubernetes-only feature. It requires the OpenSandbox controller to be deployed and a reachable OCI registry for storing snapshots. The feature is not available in Docker (single-host) mode.
Overview
| Behavior | |
|---|---|
| Pause | Creates an internal SandboxSnapshot, commits the running container root filesystem as an OCI image, then quiesces the sandbox runtime and releases Pods and pooled allocations |
| Resume | Reuses the same BatchSandbox, rewrites its template to the latest snapshot image, and recreates the runtime from that image |
| Sandbox ID | Stable across pause/resume cycles — callers use the same ID throughout the sandbox lifetime |
| Replica support | Currently limited to BatchSandbox.spec.replicas=1 |
Sandbox Lifecycle States
The sandbox transitions through both stable and intermediate states during pause and resume:| State | Type | Description |
|---|---|---|
Running | Stable | Sandbox is active and processing requests |
Pausing | Intermediate | Pause in progress — snapshot commit is coordinated through an internal SandboxSnapshot resource |
Paused | Stable | Sandbox is paused, the latest rootfs snapshot is ready, and runtime Pods and pooled allocations have been released |
Resuming | Intermediate | Resume in progress — the controller is rewriting the sandbox template to the latest snapshot image and recreating the runtime |
Failed | Stable | Operation failed — check reason and message for details |
SandboxSnapshot Internal States
For detailed progress tracking during a pause, inspect the internalSandboxSnapshot resource:
| Phase | Description |
|---|---|
Pending | Snapshot request accepted; waiting to resolve source Pod or create commit Job |
Committing | Commit Job is running and pushing snapshot images to the registry |
Succeed | Snapshot is ready and will be used for the next resume |
Failed | Snapshot creation failed |
What Is Preserved
| Preserved? | |
|---|---|
| Root filesystem contents | ✅ Yes — committed as OCI image |
| Environment variables | ✅ Yes — from BatchSandbox template |
| Running processes / memory | ❌ No — process state is not checkpointed |
| Explicit volume mounts | Depends on volume type |
Key Design Principle
Controller-level configuration — registry URL and push/pull secrets are configured on the Kubernetes controller manager, not in~/.sandbox.toml. SDK users and API callers require no code changes to use pause and resume. They simply call pause() and resume() on the existing sandbox ID.
Prerequisites
Before using pause and resume, ensure the following are in place:Kubernetes Runtime
Your OpenSandbox server must be running in Kubernetes mode with the controller deployed to the cluster.
OCI Registry
An OCI-compatible registry (Docker Hub, GHCR, Harbor, or a private
registry:2 instance) must be accessible from cluster nodes for push and from the kubelet for pull on resume.Registry Secrets
Kubernetes Secrets of type
kubernetes.io/dockerconfigjson must exist in the sandbox namespace for both push (commit Job) and pull (resumed Pod).Controller Configured
The controller manager must be started with
--snapshot-registry, --snapshot-push-secret, and --resume-pull-secret flags pointing to your registry.Controller Configuration Reference
Configure the controller manager deployment with snapshot flags:| Flag | Default | Description |
|---|---|---|
--snapshot-registry | "" | Required. OCI registry prefix. Images are stored as <registry>/<sandboxName>-<container>:snap-gen<N>. |
--snapshot-registry-insecure | false | Enables insecure registry mode for snapshot push. Use only for HTTP or self-signed local registries. |
--snapshot-push-secret | "" | Kubernetes Secret name for pushing snapshots. Must be kubernetes.io/dockerconfigjson type. |
--resume-pull-secret | "" | Kubernetes Secret name injected into resumed sandboxes for pulling snapshot images. |
--image-committer-image | "image-committer:dev" | Image used by commit Jobs. |
--commit-job-timeout | "10m" | Timeout for commit Jobs. |
Helm Chart Values
Theopensandbox-controller Helm chart exposes the snapshot-related controller values directly:
controller.snapshot.imageCommitterImagecontroller.snapshot.commitJobTimeoutcontroller.snapshot.registrycontroller.snapshot.registryInsecurecontroller.snapshot.snapshotPushSecretcontroller.snapshot.resumePullSecret
opensandbox chart, use the same values under the opensandbox-controller.* prefix.
Usage
Once the controller manager is configured and the server is running, pause and resume work through the standard Lifecycle API with no SDK changes required.Create a sandbox normally
Create a sandbox using the standard API. No special parameters are needed to enable pause/resume support.
Pause the sandbox
Call
pause() to commit the root filesystem as an OCI snapshot and release cluster resources. The call returns when the sandbox reaches the Paused state.Resume from snapshot
Call
resume() using the same sandbox ID. The controller rewrites the pod template to the latest snapshot image and recreates the runtime. The returned object has the same sandbox ID.Full SDK Examples
Multiple Pause/Resume Cycles
Pause and resume can be repeated. Each pause cycle produces a new snapshot image tag (snap-gen1, snap-gen2, and so on). The controller always uses the latest snapshot for the next resume. This means you can safely run a long-lived agent workflow across many pause/resume cycles, accumulating filesystem changes across each run.