anthropicclaudedeploymentruntimeai-agents

Claude Code SDK Deployment Modes: Ephemeral, Long-Running, and Hybrid

Sandbox0 Team·

If you have been reading about the Claude Code SDK recently, you have probably run into a confusing pile of terms:

  • ephemeral
  • long-running
  • hybrid
  • spawnClaudeCodeProcess
  • "run the SDK in Docker"
  • "separate the control plane from the execution plane"

These are not all talking about the same thing.

That is the real source of confusion. One set of terms describes how long your runtime lives. Another describes where the Claude runtime actually runs. If you keep those as two separate decisions, the architecture becomes much easier to reason about.

This post is about that distinction. It is intentionally conceptual. It does not explain how to implement any one pattern on a specific platform.

First: Claude Code SDK and Claude Agent SDK Are the Same Family#

Anthropic now uses the name Claude Agent SDK in its official docs and repositories. The older name Claude Code SDK is still common in developer discussions and search queries, so both names show up in practice.

If you look at Anthropic's current Agent SDK docs and hosting guide, the current naming is clear even though the older term is still widely used informally.

For the purpose of this post, the important point is not the naming change. The important point is what this SDK actually is.

It is not just a thin stateless API wrapper. It is a runtime-oriented agent interface designed around:

  • conversation continuity
  • tool execution
  • working directories
  • persistent session state
  • deployment inside a controlled execution environment

That is why deployment model matters much more here than it would for a normal request-response SDK.

The Two Independent Axes#

When people talk about Claude deployment, they often mix up two different questions:

  1. How long does the runtime environment live?
  2. Where does the Claude process run relative to the rest of your application?

Those are two independent axes.

The first axis is about lifecycle.

The second axis is about placement and boundaries.

spawnClaudeCodeProcess belongs to the second axis, not the first.

Axis 1: Deployment Modes#

Anthropic describes three common deployment modes: ephemeral, long-running, and hybrid.

The cleanest way to understand them is to ask two questions:

  • Is the container or runtime environment short-lived or long-lived?
  • Is the state short-lived or long-lived?
ModeRuntime lifetimeState lifetimeBest fitMain cost
ephemeralShort-livedUsually short-lived unless you externalize itOne-shot tasks, isolated jobs, batch workCold start and repeated setup
long-runningLong-livedLong-lived in the same environmentHigh-frequency interactive agents, agent servers, chat surfacesHigher steady-state resource cost
hybridShort-lived or pausableLong-lived across runtime restartsAgents users return to later, multi-step work, intermittent sessionsYou must manage state persistence deliberately

Ephemeral#

In an ephemeral deployment, you create a fresh runtime for a task, run the task, and tear the runtime down when it is done.

This is the simplest mental model:

  • new task
  • new runtime
  • do the work
  • delete the runtime

It is a strong fit for:

  • code transformations
  • evaluation jobs
  • fire-and-forget workflows
  • isolated per-request execution

What you gain is simplicity and isolation.

What you pay for is repeated setup. Every run has to recreate the environment, rebuild process state, and reattach anything the agent needs.

Long-Running#

In a long-running deployment, the runtime stays alive and keeps serving work over time.

That usually means:

  • the same process stays up
  • the same container stays up
  • the same local state stays available in place

This is a strong fit when:

  • users interact with the same agent frequently
  • the agent exposes an HTTP or WebSocket service
  • startup cost is high enough that repeated cold starts are wasteful

What you gain is low latency and in-memory continuity.

What you pay for is a higher operational floor. Long-lived environments accumulate state, consume resources even while mostly idle, and require more care around health, cleanup, and multi-tenant boundaries.

Hybrid#

hybrid is the most misunderstood mode.

Many people hear "hybrid" and assume it means "kind of long-running." That is not the useful distinction.

The real idea is:

the runtime can be short-lived, while the important state remains long-lived

That means you can stop, pause, or recreate the execution environment without throwing away the work.

That preserved state can include:

  • conversation history
  • local workspace files
  • caches
  • checkpoints
  • session metadata

Hybrid is a strong fit for agent workloads where users leave and come back later, or where work continues in stages rather than in one continuous burst.

The benefit is obvious: you do not pay the full cost of keeping every runtime alive forever.

The catch is also obvious: state has to be designed as a first-class concern. If your session files, working directory, or coordination state disappear with the container, then you do not actually have a hybrid system. You just have a restarted ephemeral system.

Why Hybrid Matters More for Agents Than for Ordinary Apps#

For a normal stateless web service, restarting the container is usually not a big deal.

For an agent runtime, the local execution environment often carries real working state:

  • cloned repositories
  • downloaded artifacts
  • tool configuration
  • partially completed outputs
  • chat and tool-call history

That is why agent infrastructure quickly runs into a storage and continuity problem that ordinary stateless services can ignore.

In practice, hybrid becomes attractive as soon as you want all of the following:

  • interactive feel
  • lower steady-state cost than keeping everything always on
  • resumability
  • continuity across sessions

Axis 2: Where the Claude Runtime Actually Runs#

Now we move to the second axis.

This is not about lifecycle. It is about boundary placement.

There are two broad patterns.

Simple Mode#

In the simplest architecture, your application code and the Claude runtime live in the same execution environment.

That usually means:

  • your service starts
  • the SDK is available in the same container or VM
  • Claude runs there directly

Conceptually:

This is the easiest thing to build.

It is often the right choice when:

  • you are prototyping
  • you have one tenant or a small number of trusted workloads
  • you want the minimum number of moving parts

The tradeoff is that control logic and execution logic are tightly coupled. The place that receives traffic is also the place that hosts the agent runtime.

Control and Execution Separated#

In a more split architecture, your application acts as the control plane, while the Claude runtime executes somewhere else.

That "somewhere else" might be:

  • a container
  • a VM
  • a remote sandbox
  • a dedicated worker runtime

Conceptually:

This is the pattern where spawnClaudeCodeProcess becomes relevant.

The point of this model is not just indirection for its own sake. It creates a clearer boundary between:

  • the system that decides what work to run
  • the system that actually runs the work

That matters for:

  • multi-tenant systems
  • isolation
  • scheduling
  • auditability
  • network controls
  • secret handling
  • platform ownership boundaries

What spawnClaudeCodeProcess Actually Changes#

spawnClaudeCodeProcess is easy to misread because it sounds like a deployment mode. It is not.

In Anthropic's TypeScript Agent SDK reference, it is presented as a way to launch the Claude runtime in another environment. That is exactly why it belongs to the runtime-boundary discussion rather than the lifecycle discussion.

It does not answer:

  • Should this runtime be ephemeral?
  • Should it be long-running?
  • Should it be hybrid?

It answers a different question:

  • Should the Claude runtime be launched in the same environment as my application, or in a separate execution environment?

That means spawnClaudeCodeProcess changes the runtime boundary, not the lifecycle model.

This is the single most important distinction in this whole topic.

The Six Common Combinations#

Once you separate the two axes, you do not get three architectures. You get a matrix.

Lifecycle modeSimple modeControl and execution separated
ephemeralFresh app-local runtime per taskFresh remote execution runtime per task
long-runningApp-local runtime stays upRemote execution runtime stays up
hybridApp-local runtime can restart while state persistsRemote execution runtime can restart while state persists

Some combinations are more common than others.

ephemeral + simple mode#

This is a good prototype architecture.

You keep everything together, launch a runtime for a task, and throw it away afterward. It is easy to reason about and easy to delete.

long-running + simple mode#

This is common for internal tools and small agent servers.

It is operationally straightforward early on, but over time it tends to mix app concerns and execution concerns into one place.

ephemeral + separated execution#

This is a good fit for highly isolated job-style systems.

The control plane remains stable while each task gets a fresh remote runtime.

hybrid + separated execution#

This is one of the most interesting patterns for serious agent platforms.

It gives you:

  • a clean control boundary
  • resumable execution state
  • more control over cost than pure long-running
  • better isolation than putting everything in the same app container

That combination is often where sandbox-based agent platforms become especially compelling.

How to Choose#

If you only need a rough rule of thumb, use this one:

SituationUsually best starting point
One-shot jobs, evals, isolated tasksephemeral
High-frequency interaction with low latencylong-running
Users return later and expect continuityhybrid
Small prototype or internal toolsimple mode
Multi-tenant platform or strong isolation boundarycontrol and execution separated

Another way to say it:

  • choose ephemeral, long-running, or hybrid based on workload continuity
  • choose simple mode or separated execution based on system boundary design

What This Means for Sandbox Runtimes#

Once a Claude-based agent moves beyond a toy deployment, the runtime underneath it starts to matter a lot.

If you want to support these deployment models cleanly, the underlying sandbox runtime usually needs more than "run a container and exec a command."

It needs to handle concerns like:

  • persistent workspace state
  • resumable session state
  • predictable startup latency
  • isolated execution boundaries
  • network control
  • secret handling outside untrusted agent code
  • a clean distinction between control and execution

Those requirements are not unique to Anthropic. They are the natural infrastructure consequences of giving agents real tools, real filesystems, and real continuity.

The Short Version#

If you remember only three things, make them these:

  1. ephemeral, long-running, and hybrid describe lifecycle
  2. simple mode versus separated execution describes runtime boundary placement
  3. spawnClaudeCodeProcess is about the second, not the first

That is the conceptual model that keeps the architecture clear.

The implementation question comes afterward.

Later, we will publish a dedicated guide on how to map these patterns onto a sandbox runtime in practice.

If you want adjacent context in the meantime, read Persistent Storage for AI Agent Sandboxes for the storage layer behind resumable agent work.