anthropicclaudedeploymentruntimeai-agents

Claude Code SDK Deployment Modes: Ephemeral, Long-Running, and Hybrid

Sandbox0 Team·April 8, 2026

If you have been reading about the Claude Code SDK recently, you have probably run into a confusing pile of terms:

ephemeral
long-running
hybrid
spawnClaudeCodeProcess
"run the SDK in Docker"
"separate the control plane from the execution plane"

These are not all talking about the same thing.

That is the real source of confusion. One set of terms describes how long your runtime lives. Another describes where the Claude runtime actually runs. If you keep those as two separate decisions, the architecture becomes much easier to reason about.

This post is about that distinction. It is intentionally conceptual. It does not explain how to implement any one pattern on a specific platform.

First: Claude Code SDK and Claude Agent SDK Are the Same Family#

Anthropic now uses the name Claude Agent SDK in its official docs and repositories. The older name Claude Code SDK is still common in developer discussions and search queries, so both names show up in practice.

If you look at Anthropic's current Agent SDK docs and hosting guide, the current naming is clear even though the older term is still widely used informally.

For the purpose of this post, the important point is not the naming change. The important point is what this SDK actually is.

It is not just a thin stateless API wrapper. It is a runtime-oriented agent interface designed around:

conversation continuity
tool execution
working directories
persistent session state
deployment inside a controlled execution environment

That is why deployment model matters much more here than it would for a normal request-response SDK.

The Two Independent Axes#

When people talk about Claude deployment, they often mix up two different questions:

How long does the runtime environment live?
Where does the Claude process run relative to the rest of your application?

Those are two independent axes.

The first axis is about lifecycle.

The second axis is about placement and boundaries.

spawnClaudeCodeProcess belongs to the second axis, not the first.

Axis 1: Deployment Modes#

Anthropic describes three common deployment modes: ephemeral, long-running, and hybrid.

The cleanest way to understand them is to ask two questions:

Is the container or runtime environment short-lived or long-lived?
Is the state short-lived or long-lived?

Mode	Runtime lifetime	State lifetime	Best fit	Main cost
`ephemeral`	Short-lived	Usually short-lived unless you externalize it	One-shot tasks, isolated jobs, batch work	Cold start and repeated setup
`long-running`	Long-lived	Long-lived in the same environment	High-frequency interactive agents, agent servers, chat surfaces	Higher steady-state resource cost
`hybrid`	Short-lived or pausable	Long-lived across runtime restarts	Agents users return to later, multi-step work, intermittent sessions	You must manage state persistence deliberately

Ephemeral#

In an ephemeral deployment, you create a fresh runtime for a task, run the task, and tear the runtime down when it is done.

This is the simplest mental model:

new task
new runtime
do the work
delete the runtime

It is a strong fit for:

code transformations
evaluation jobs
fire-and-forget workflows
isolated per-request execution

What you gain is simplicity and isolation.

What you pay for is repeated setup. Every run has to recreate the environment, rebuild process state, and reattach anything the agent needs.

Long-Running#

In a long-running deployment, the runtime stays alive and keeps serving work over time.

That usually means:

the same process stays up
the same container stays up
the same local state stays available in place

This is a strong fit when:

users interact with the same agent frequently
the agent exposes an HTTP or WebSocket service
startup cost is high enough that repeated cold starts are wasteful

What you gain is low latency and in-memory continuity.

What you pay for is a higher operational floor. Long-lived environments accumulate state, consume resources even while mostly idle, and require more care around health, cleanup, and multi-tenant boundaries.

Hybrid#

hybrid is the most misunderstood mode.

Many people hear "hybrid" and assume it means "kind of long-running." That is not the useful distinction.

The real idea is:

the runtime can be short-lived, while the important state remains long-lived

That means you can stop, pause, or recreate the execution environment without throwing away the work.

That preserved state can include:

conversation history
local workspace files
caches
checkpoints
session metadata

Hybrid is a strong fit for agent workloads where users leave and come back later, or where work continues in stages rather than in one continuous burst.

The benefit is obvious: you do not pay the full cost of keeping every runtime alive forever.

The catch is also obvious: state has to be designed as a first-class concern. If your session files, working directory, or coordination state disappear with the container, then you do not actually have a hybrid system. You just have a restarted ephemeral system.

Why Hybrid Matters More for Agents Than for Ordinary Apps#

For a normal stateless web service, restarting the container is usually not a big deal.

For an agent runtime, the local execution environment often carries real working state:

cloned repositories
downloaded artifacts
tool configuration
partially completed outputs
chat and tool-call history

That is why agent infrastructure quickly runs into a storage and continuity problem that ordinary stateless services can ignore.

In practice, hybrid becomes attractive as soon as you want all of the following:

interactive feel
lower steady-state cost than keeping everything always on
resumability
continuity across sessions

Axis 2: Where the Claude Runtime Actually Runs#

Now we move to the second axis.

This is not about lifecycle. It is about boundary placement.

There are two broad patterns.

Simple Mode#

In the simplest architecture, your application code and the Claude runtime live in the same execution environment.

That usually means:

your service starts
the SDK is available in the same container or VM
Claude runs there directly

Conceptually:

This is the easiest thing to build.

It is often the right choice when:

you are prototyping
you have one tenant or a small number of trusted workloads
you want the minimum number of moving parts

The tradeoff is that control logic and execution logic are tightly coupled. The place that receives traffic is also the place that hosts the agent runtime.

Control and Execution Separated#

In a more split architecture, your application acts as the control plane, while the Claude runtime executes somewhere else.

That "somewhere else" might be:

a container
a VM
a remote sandbox
a dedicated worker runtime

Conceptually:

This is the pattern where spawnClaudeCodeProcess becomes relevant.

The point of this model is not just indirection for its own sake. It creates a clearer boundary between:

the system that decides what work to run
the system that actually runs the work

That matters for:

multi-tenant systems
isolation
scheduling
auditability
network controls
secret handling
platform ownership boundaries

What `spawnClaudeCodeProcess` Actually Changes#

spawnClaudeCodeProcess is easy to misread because it sounds like a deployment mode. It is not.

In Anthropic's TypeScript Agent SDK reference, it is presented as a way to launch the Claude runtime in another environment. That is exactly why it belongs to the runtime-boundary discussion rather than the lifecycle discussion.

It does not answer:

Should this runtime be ephemeral?
Should it be long-running?
Should it be hybrid?

It answers a different question:

Should the Claude runtime be launched in the same environment as my application, or in a separate execution environment?

That means spawnClaudeCodeProcess changes the runtime boundary, not the lifecycle model.

This is the single most important distinction in this whole topic.

The Six Common Combinations#

Once you separate the two axes, you do not get three architectures. You get a matrix.

Lifecycle mode	Simple mode	Control and execution separated
`ephemeral`	Fresh app-local runtime per task	Fresh remote execution runtime per task
`long-running`	App-local runtime stays up	Remote execution runtime stays up
`hybrid`	App-local runtime can restart while state persists	Remote execution runtime can restart while state persists

Some combinations are more common than others.

`ephemeral` + simple mode#

This is a good prototype architecture.

You keep everything together, launch a runtime for a task, and throw it away afterward. It is easy to reason about and easy to delete.

`long-running` + simple mode#

This is common for internal tools and small agent servers.

It is operationally straightforward early on, but over time it tends to mix app concerns and execution concerns into one place.

`ephemeral` + separated execution#

This is a good fit for highly isolated job-style systems.

The control plane remains stable while each task gets a fresh remote runtime.

`hybrid` + separated execution#

This is one of the most interesting patterns for serious agent platforms.

It gives you:

a clean control boundary
resumable execution state
more control over cost than pure long-running
better isolation than putting everything in the same app container

That combination is often where sandbox-based agent platforms become especially compelling.

How to Choose#

If you only need a rough rule of thumb, use this one:

Situation	Usually best starting point
One-shot jobs, evals, isolated tasks	`ephemeral`
High-frequency interaction with low latency	`long-running`
Users return later and expect continuity	`hybrid`
Small prototype or internal tool	simple mode
Multi-tenant platform or strong isolation boundary	control and execution separated

Another way to say it:

choose ephemeral, long-running, or hybrid based on workload continuity
choose simple mode or separated execution based on system boundary design

What This Means for Sandbox Runtimes#

Once a Claude-based agent moves beyond a toy deployment, the runtime underneath it starts to matter a lot.

If you want to support these deployment models cleanly, the underlying sandbox runtime usually needs more than "run a container and exec a command."

It needs to handle concerns like:

persistent workspace state
resumable session state
predictable startup latency
isolated execution boundaries
network control
secret handling outside untrusted agent code
a clean distinction between control and execution

Those requirements are not unique to Anthropic. They are the natural infrastructure consequences of giving agents real tools, real filesystems, and real continuity.

The Short Version#

If you remember only three things, make them these:

ephemeral, long-running, and hybrid describe lifecycle
simple mode versus separated execution describes runtime boundary placement
spawnClaudeCodeProcess is about the second, not the first

That is the conceptual model that keeps the architecture clear.

The implementation question comes afterward.

Later, we will publish a dedicated guide on how to map these patterns onto a sandbox runtime in practice.

If you want adjacent context in the meantime, read Persistent Storage for AI Agent Sandboxes for the storage layer behind resumable agent work.