SANDBOX/Pause And Resume

#Pause And Resume

Sandbox0 can pause an idle sandbox without deleting it. Pause checkpoints the writable root filesystem, releases the runtime pod, and keeps the sandbox identity and durable state. Resume creates a new runtime generation and restores the latest rootfs checkpoint.

Use this page when you need to:

  • pause a sandbox explicitly
  • resume a paused sandbox explicitly
  • understand how ttl, auto_resume, service routes, and SSH interact with pause state
  • inspect whether a sandbox is running or paused

Pause is a runtime lifecycle operation. It checkpoints the writable rootfs and deletes the runtime pod; it does not freeze cgroups or preserve running processes, memory, sockets, PID state, or live REPL sessions.

Self-hosted deployments must run ctld on sandbox nodes for checkpointed pause/resume. Pause saves the writable rootfs checkpoint and releases the runtime pod; it does not preserve process state.

What Persists#

StateAcross pause/resumeAfter sandbox delete or hard_ttl
Sandbox identity and configurationPreservedDeleted
Writable root filesystemLatest checkpoint is restoredDeleted with the sandbox identity
Mounted VolumesRemounted if configuredPreserved until the Volume is deleted
Processes, memory, sockets, PID state, live REPL sessionsNot preservedDeleted

The rootfs checkpoint covers files written inside the sandbox filesystem, including files outside mounted volumes. It is different from a Sandbox Volume: it is tied to sandbox rootfs lifecycle and does not provide volume sharing or direct Volume file APIs. For named rootfs snapshots, restore, and fork operations, see Snapshot And Restore.

On nodes using the containerd overlayfs snapshotter, ctld checkpoints only the active snapshot upperdir and stores it as a standard OCI layer diff. Other snapshotters, or nodes where the overlayfs upperdir is not readable by ctld, fall back to containerd's built-in diff path.

Pause A Sandbox#

Pause a sandbox to release compute while keeping its identity, configuration, services, mounted durable storage, and latest writable rootfs checkpoint available for a later resume.

The pause endpoint accepts the lifecycle transition and returns the current committed state. Rootfs checkpoint upload and runtime pod deletion continue in the background. Until the pause transaction commits, sandbox details continue to report the previous committed runtime state, usually running. Poll sandbox details until status is paused and paused is true before treating the checkpoint as resume-ready.

Runtime access paths do not expose pausing or resuming states. File, context, public service, and SSH requests are linearized to a committed runtime generation: they either use the currently committed runtime or wait for the lifecycle transaction to commit before continuing after resume. Lifecycle work in progress should not be handled as a caller-visible 409 Conflict; conflicts are reserved for real policy or state conflicts such as disabled auto-resume, non-restartable service routes, quota limits, deletion, or paused-only operations.

Automatic pause from ttl is opportunistic. If supported runtime access arrives before an automatic pause reaches its commit point, Sandbox0 cancels that automatic pause, releases the runtime barrier, and continues on the existing runtime generation. Explicit pause requests are not canceled by runtime access; those requests wait for the committed pause and then resume if auto-resume is allowed.

POST

/api/v1/sandboxes/{id}/pause

go
_, err := client.PauseSandbox(ctx, sandbox.ID) if err != nil { log.Fatal(err) } fmt.Println("Pause requested")

Resume A Sandbox#

Resume a sandbox that was paused manually or by TTL expiry.

Resume first restores the writable rootfs checkpoint onto a runtime pod created from the current template image. If that restore fails, Sandbox0 retries once with a runtime pod pinned to the checkpoint's recorded base image digest. Kubernetes pulls and garbage-collects that image through the normal kubelet image lifecycle.

POST

/api/v1/sandboxes/{id}/resume

go
_, err := client.ResumeSandbox(ctx, sandbox.ID) if err != nil { log.Fatal(err) } fmt.Println("Resume requested")

Inspect Pause State#

After requesting a pause or resume, fetch sandbox details to check the lifecycle status.

During asynchronous pause, paused remains false and status continues to reflect the last committed runtime state, usually running. It becomes true only after checkpoint upload has completed, the runtime pod has been released, and the sandbox is safe to resume from durable state.

During resume, callers may wait while Sandbox0 creates and initializes a new runtime generation. After the resume transaction commits, status is running, paused is false, and runtime_generation has advanced. If an automatic pause is canceled before commit, status remains running, paused remains false, and runtime_generation does not change.

go
sb, err := client.GetSandbox(ctx, sandbox.ID) if err != nil { log.Fatal(err) } fmt.Printf("paused=%v ", sb.Paused) fmt.Printf("status=%s ", sb.Status) fmt.Printf("runtime_generation=%d ", sb.RuntimeGeneration)

TTL And Auto Resume#

ttl and hard_ttl interact with pause differently:

FieldBehavior
ttlRuntime soft timeout. When it expires, Sandbox0 checkpoints the writable rootfs, pauses the sandbox, and releases runtime compute.
hard_ttlSandbox hard timeout. When it expires, Sandbox0 deletes the sandbox identity and durable state, including rootfs checkpoints, even if the sandbox is already paused.

auto_resume is the sandbox-level gate for whether inbound access can wake a paused sandbox:

  • when auto_resume is true, supported access paths can resume the sandbox automatically
  • when auto_resume is false, you must call resume explicitly

Common auto-resume entrypoints:

  • sandbox runtime APIs such as files and contexts
  • service routes configured with resume: true; public service requests require sandbox auto_resume: true, route resume: true, and a restartable service runtime (cmd or function)
  • SSH access through Sandbox0 SSH Gateway

For a paused sandbox, auto-resume creates a new runtime pod for the same sandbox identity and restores the saved rootfs checkpoint before the sandbox is used. Public service URLs remain stable until the sandbox is explicitly deleted or hard_ttl expires.

See Sandbox Services, Sandbox Functions, and SSH for the user-facing flows.

Typical Pattern#

For long-running agent workflows, a common pattern is:

  1. claim a sandbox with ttl set
  2. do active work through contexts and files
  3. let idle sandboxes pause automatically
  4. resume on the next user action, SSH session, or service route hit
  5. use hard_ttl to cap the maximum lifetime

Next Steps#

Contexts

Run REPL and command contexts inside a sandbox and stream process output.

Files

Read, write, watch, and manage files inside a sandbox workspace.

Snapshot And Restore

Create named restore points and fork paused sandbox rootfs state.