dockersandboxai-agentsdeveloper-environmentstesting

Docker in Sandbox: Run Docker Inside an AI Agent Sandbox with Sandbox0

Sandbox0 Team·

Docker in Sandbox for AI agents is no longer a niche runtime feature.

AI agents are moving from code snippets to real developer work.

That changes what the runtime has to provide. A serious coding agent does not only need a shell and a temporary directory. It needs to check out a repository, install dependencies, run integration tests, start databases, build images, preview services, and clean up after itself.

In many real projects, those workflows already depend on Docker.

That is why "Docker in sandbox" is becoming a core capability for agent infrastructure. Vercel recently added Docker support inside Vercel Sandbox. Cloudflare documents Docker-in-Docker for its Sandbox SDK. Modal has alpha support for Docker inside modal.Sandbox. E2B and Runloop both publish Docker-enabled sandbox or devbox templates. Docker's own AI sandbox product frames the same requirement around an isolated Docker daemon for coding agents.

The market signal is straightforward: AI agents need environments that look much closer to a developer machine than to a single code runner.

Sandbox0 Docker in Sandbox is built for that runtime gap: run Docker inside an AI agent sandbox without giving the agent control of a shared host Docker daemon.

With the dins template, a Sandbox0 sandbox runs a Docker daemon inside the sandbox. Agent tools can use familiar commands such as docker run, docker build, and Docker Compose style workflows without depending on a Docker daemon on the host machine.

Why Docker Matters for AI Agent Sandboxes#

Most production repositories are not tested with one process.

A backend test suite may need PostgreSQL, Redis, Kafka, MinIO, or a browser service. A generated application may need to be built into an image before a deployment pipeline will accept it. A coding agent may need to validate a Dockerfile, run a service image, or reproduce the exact command a human developer would run locally.

Without Docker support, teams often end up with weak substitutes:

  • mock every dependency and miss integration failures
  • give the agent access to shared staging services
  • mount the host Docker socket into an execution container
  • move builds back to CI and lose interactive feedback
  • hand-write one-off service startup scripts for every template

Those options are acceptable for demos. They are brittle for production agent systems.

An agent sandbox with a local Docker daemon gives the agent a more realistic developer environment:

  • start Redis or PostgreSQL next to the test process
  • build a temporary image from generated code
  • run a containerized CLI or language tool
  • validate Dockerfiles before pushing a change
  • exercise Compose-style workflows inside the sandbox
  • keep containers and test dependencies local to the sandbox lifecycle

That last point matters. The goal is not only to make docker run work. The goal is to keep the blast radius of agent-created containers inside the runtime boundary where the agent is already operating.

What Sandbox0 Docker in Sandbox Provides#

Sandbox0 ships a built-in Docker in Sandbox template named dins.

When a sandbox is claimed from that template, the sandbox starts a managed warm process for dockerd and waits for docker info to become ready. The Docker client and daemon are installed in the default template image, but the default template does not start Docker. The dins template is the runtime profile that turns Docker on.

Conceptually:

The important behavior is simple:

  • docker run starts containers inside the sandbox.
  • docker build stores layers in the sandbox Docker data root.
  • Docker state lives under /var/lib/docker.
  • /var/lib/docker is ephemeral sandbox runtime state.
  • Durable source code, generated files, database dumps, and artifacts should live in Sandbox Volumes.

This separation is intentional. Docker layers and throwaway containers are runtime cache. Agent workspaces and outputs are product state.

If a sandbox is deleted, Docker images, containers, layers, and Docker volumes under /var/lib/docker are discarded. If the agent needs to preserve files across cleanup, put those files in a Sandbox Volume and mount that Volume into the sandbox.

Example: Run Test Databases Inside the Sandbox#

A common use case is integration testing.

Claim a dins sandbox, start Redis and PostgreSQL with Docker, then run tests against 127.0.0.1 from commands inside the same sandbox:

bash
SANDBOX_ID="$(s0 -o json sandbox create --template dins --hard-ttl 1800 | jq -r '.ID')" trap 's0 sandbox delete "$SANDBOX_ID" >/dev/null 2>&1 || true' EXIT s0 sandbox exec "$SANDBOX_ID" --no-wait --ttl 1800 -- /bin/sh -lc ' set -e docker rm -f test-redis test-postgres >/dev/null 2>&1 || true docker pull redis:7-alpine docker pull postgres:16-alpine docker run -d --name test-redis -p 6379:6379 redis:7-alpine docker run -d --name test-postgres \ -e POSTGRES_PASSWORD=postgres \ -e POSTGRES_DB=app_test \ -p 5432:5432 \ postgres:16-alpine until docker exec test-redis redis-cli ping | grep -q PONG; do sleep 1; done until docker exec test-postgres pg_isready -U postgres >/dev/null; do sleep 1; done ' s0 sandbox exec "$SANDBOX_ID" -- /bin/sh -lc ' REDIS_URL=redis://127.0.0.1:6379 \ DATABASE_URL="postgres://postgres:[email protected]:5432/app_test?sslmode=disable" \ go test ./... '

This is the same shape as a developer running local service dependencies before a test suite, but the services live inside the sandbox instead of on a shared workstation or CI worker.

For agents, that is a cleaner contract. The agent can start what it needs, test against it, and delete the sandbox when the run is over.

Example: Build and Run a Temporary Image#

Docker in Sandbox also helps when the agent needs to validate generated container artifacts.

Inside a dins sandbox:

bash
cat > main.go <<'EOF' package main import "fmt" func main() { fmt.Println("hello from docker in sandbox") } EOF CGO_ENABLED=0 go build -o hello main.go cat > Dockerfile <<'EOF' FROM scratch COPY hello /hello ENTRYPOINT ["/hello"] EOF docker build -t sandbox0-dins-test . docker run --rm sandbox0-dins-test

The build cache and image layers are useful during the run, but they are not treated as durable application state. If the agent needs to preserve the Dockerfile, source, build logs, SBOM, or exported artifact, write those files into a mounted Sandbox Volume.

Why Not Mount the Host Docker Socket?#

Mounting the host Docker socket is the easiest way to make Docker commands work inside a container.

It is also the wrong default for autonomous agents.

The Docker socket is a control plane for the host Docker daemon. A process that can drive that socket can usually create containers, mount host paths, inspect other containers, and modify state outside the intended execution environment. For trusted CI jobs, teams sometimes accept that tradeoff. For agent-generated commands and untrusted code, it weakens the runtime boundary.

Docker in Sandbox should mean "the sandbox has its own Docker daemon," not "the sandbox can control the host daemon."

Sandbox0's dins template starts dockerd inside the sandbox and exposes the sandbox-local Docker socket to processes running there. That keeps agent-created containers attached to the sandbox lifecycle instead of a shared host Docker daemon.

The Operator Boundary: Docker Requires Real Privilege#

Running a Docker daemon inside a sandbox is not a harmless checkbox.

Docker needs Linux capabilities, process isolation, cgroups, networking behavior, and a writable daemon data directory. In Kubernetes, rootful Docker-in-Docker commonly requires a privileged container. That is why Sandbox0 treats the Docker runtime profile as an operator-controlled system template, not as a raw privilege knob for ordinary team API keys.

The built-in dins template uses a privileged security context and an ephemeral emptyDir mount at /var/lib/docker. The default resource profile is:

FieldDefault
Template IDdins
CPU1
Memory4Gi
Ephemeral storage20Gi
Docker data root/var/lib/docker
Docker data root size limit20Gi

Regular team-owned templates can define normal runtime behavior such as warm processes, but privileged fields such as mainContainer.securityContext and pod.emptyDirMounts require a system-level identity.

For production self-hosted deployments, this distinction matters. If your cluster runs ordinary container runtime isolation, privileged Docker-in-Docker has a larger kernel-level blast radius than a normal sandbox. Operators should pair Docker-enabled templates with the isolation model and node placement appropriate for their risk profile, such as dedicated sandbox nodes or VM-backed runtime classes where available.

The product point is not to hide this tradeoff. It is to make Docker support an explicit platform capability with an explicit operational boundary.

Docker State vs. Workspace State#

Docker in Sandbox is most useful when the platform is clear about what persists.

There are two different states:

StateExampleRecommended Sandbox0 primitive
Docker runtime stateimage layers, pulled images, containers, Docker volumes/var/lib/docker ephemeral runtime storage
Agent workspace statesource code, generated files, test fixtures, database dumps, reportsSandbox Volumes

Treating Docker state as durable by default can be expensive and surprising. Image layers are large. Test databases can grow quickly. Agent-created containers are often intermediate work, not long-term output.

Sandbox0 keeps the default boundary conservative: Docker state is sandbox-local and ephemeral, while durable work belongs in Volumes.

That matches how agent workflows usually behave:

  • use Docker to create the temporary environment
  • run tests or builds
  • copy meaningful outputs into the workspace
  • snapshot, fork, restore, or persist the workspace through Sandbox Volumes
  • delete the sandbox and discard container runtime state

This keeps the durable storage plane focused on files the product actually needs.

How This Compares to Other Agent Sandbox Docker Support#

The ecosystem is converging on the same requirement, but the implementations have different tradeoffs.

PlatformPublic Docker-in-sandbox shapeNotable boundary
Vercel SandboxInstall Docker, start dockerd, and run containers inside a Firecracker-backed sandboxVercel describes sandbox isolation as a Firecracker microVM with its own filesystem and network
Cloudflare SandboxDocker-in-Docker guide based on rootless DockerDocuments limitations around no iptables, rootless mode, and ephemeral storage
Modal SandboxesAlpha support for Docker inside modal.Sandbox through an experimental optionDocker state is currently not captured by filesystem snapshots
E2BDocker and Docker Compose template examplesRecommends higher CPU and memory for Docker workloads
RunloopDocker-in-Docker Devbox blueprintsFrames the use case around running project dependencies beside AI agents
Docker SandboxesAI coding agents run in microVM sandboxes with their own Docker daemonEmphasizes an isolated Docker Engine separate from the host daemon
Sandbox0Built-in dins template with sandbox-local dockerd, warm readiness, and ephemeral Docker stateOperator-controlled system template with durable workspace state separated into Sandbox Volumes

The interesting part is not who can run docker hello-world.

The interesting part is the runtime contract around Docker:

  • Is Docker state durable or ephemeral?
  • Does the agent get the host Docker socket or a sandbox-local daemon?
  • Can the sandbox run service dependencies beside tests?
  • Can the platform separate temporary container state from durable workspace state?
  • Who is allowed to enable the privilege needed for Docker?
  • How does the network policy model interact with containers started inside the sandbox?

Those are infrastructure questions, not SDK convenience questions.

What Docker in Sandbox Unlocks for Agents#

Docker support makes several agent workloads more realistic.

Integration Tests#

Agents can run test suites that depend on real services. Instead of mocking PostgreSQL or Redis, the agent can start service containers and run the same test command a human would run locally.

Container Image Validation#

Agents can edit a Dockerfile, build the image, run it, inspect logs, and catch broken build steps before opening a pull request or handing work back to CI.

Generated App Preview#

When an agent generates a containerized app, it can run the image inside the sandbox and expose the relevant service through Sandbox Services when an external preview is needed.

Tool Compatibility#

Many developer tools assume Docker exists. Docker in Sandbox reduces the amount of special-case template work required to support those tools.

Safer Autonomy#

The agent can perform Docker-heavy work without controlling a shared host Docker daemon. That does not remove every security consideration, but it gives the platform a clearer runtime boundary to manage.

When Not to Use Docker in Sandbox#

Docker is powerful, but it is not always the right primitive.

Use the default Sandbox0 template when the workload only needs normal shell commands, language runtimes, file APIs, or lightweight tools.

Use Docker in Sandbox when the workload actually needs Docker semantics:

  • service containers
  • image builds
  • Compose-style local dependency graphs
  • containerized CLIs
  • Dockerfile validation
  • test flows that are already Docker-native

If you only need a long-running HTTP process, use Sandbox Services. If you need files to persist after sandbox cleanup, use Sandbox Volumes. If you need to protect outbound traffic or inject credentials, use Sandbox Network and credential policies.

Docker is part of the runtime toolbox. It should not become the only way to model every agent workload.

Build Agents Against Real Developer Environments#

The practical test for an AI agent runtime is simple:

Can the agent run the commands the repository already expects?

For many teams, those commands include Docker. They start databases, build images, run Compose workflows, and validate containerized applications.

Sandbox0 Docker in Sandbox gives agents that capability inside an isolated, operator-managed sandbox profile. The dins template starts a sandbox-local Docker daemon, keeps Docker runtime state under ephemeral /var/lib/docker, and lets durable workspace state live in Sandbox Volumes.

That is the right shape for production agent infrastructure: real developer workflows, explicit operational boundaries, and no dependency on a shared host Docker daemon.

Read the Docker in Sandbox docs, or start from the Sandbox0 self-hosted configuration guide if you are enabling Docker in Sandbox for your own deployment.

References#