Claude Code SDK Deployment Modes: Ephemeral, Long-Running, and Hybrid
If you have been reading about the Claude Code SDK recently, you have probably run into a confusing pile of terms:
ephemerallong-runninghybridspawnClaudeCodeProcess- "run the SDK in Docker"
- "separate the control plane from the execution plane"
These are not all talking about the same thing.
That is the real source of confusion. One set of terms describes how long your runtime lives. Another describes where the Claude runtime actually runs. If you keep those as two separate decisions, the architecture becomes much easier to reason about.
This post is about that distinction. It is intentionally conceptual. It does not explain how to implement any one pattern on a specific platform.
First: Claude Code SDK and Claude Agent SDK Are the Same Family#
Anthropic now uses the name Claude Agent SDK in its official docs and repositories. The older name Claude Code SDK is still common in developer discussions and search queries, so both names show up in practice.
If you look at Anthropic's current Agent SDK docs and hosting guide, the current naming is clear even though the older term is still widely used informally.
For the purpose of this post, the important point is not the naming change. The important point is what this SDK actually is.
It is not just a thin stateless API wrapper. It is a runtime-oriented agent interface designed around:
- conversation continuity
- tool execution
- working directories
- persistent session state
- deployment inside a controlled execution environment
That is why deployment model matters much more here than it would for a normal request-response SDK.
The Two Independent Axes#
When people talk about Claude deployment, they often mix up two different questions:
- How long does the runtime environment live?
- Where does the Claude process run relative to the rest of your application?
Those are two independent axes.
The first axis is about lifecycle.
The second axis is about placement and boundaries.
spawnClaudeCodeProcess belongs to the second axis, not the first.
Axis 1: Deployment Modes#
Anthropic describes three common deployment modes: ephemeral, long-running, and hybrid.
The cleanest way to understand them is to ask two questions:
- Is the container or runtime environment short-lived or long-lived?
- Is the state short-lived or long-lived?
| Mode | Runtime lifetime | State lifetime | Best fit | Main cost |
|---|---|---|---|---|
ephemeral | Short-lived | Usually short-lived unless you externalize it | One-shot tasks, isolated jobs, batch work | Cold start and repeated setup |
long-running | Long-lived | Long-lived in the same environment | High-frequency interactive agents, agent servers, chat surfaces | Higher steady-state resource cost |
hybrid | Short-lived or pausable | Long-lived across runtime restarts | Agents users return to later, multi-step work, intermittent sessions | You must manage state persistence deliberately |
Ephemeral#
In an ephemeral deployment, you create a fresh runtime for a task, run the task, and tear the runtime down when it is done.
This is the simplest mental model:
- new task
- new runtime
- do the work
- delete the runtime
It is a strong fit for:
- code transformations
- evaluation jobs
- fire-and-forget workflows
- isolated per-request execution
What you gain is simplicity and isolation.
What you pay for is repeated setup. Every run has to recreate the environment, rebuild process state, and reattach anything the agent needs.
Long-Running#
In a long-running deployment, the runtime stays alive and keeps serving work over time.
That usually means:
- the same process stays up
- the same container stays up
- the same local state stays available in place
This is a strong fit when:
- users interact with the same agent frequently
- the agent exposes an HTTP or WebSocket service
- startup cost is high enough that repeated cold starts are wasteful
What you gain is low latency and in-memory continuity.
What you pay for is a higher operational floor. Long-lived environments accumulate state, consume resources even while mostly idle, and require more care around health, cleanup, and multi-tenant boundaries.
Hybrid#
hybrid is the most misunderstood mode.
Many people hear "hybrid" and assume it means "kind of long-running." That is not the useful distinction.
The real idea is:
the runtime can be short-lived, while the important state remains long-lived
That means you can stop, pause, or recreate the execution environment without throwing away the work.
That preserved state can include:
- conversation history
- local workspace files
- caches
- checkpoints
- session metadata
Hybrid is a strong fit for agent workloads where users leave and come back later, or where work continues in stages rather than in one continuous burst.
The benefit is obvious: you do not pay the full cost of keeping every runtime alive forever.
The catch is also obvious: state has to be designed as a first-class concern. If your session files, working directory, or coordination state disappear with the container, then you do not actually have a hybrid system. You just have a restarted ephemeral system.
Why Hybrid Matters More for Agents Than for Ordinary Apps#
For a normal stateless web service, restarting the container is usually not a big deal.
For an agent runtime, the local execution environment often carries real working state:
- cloned repositories
- downloaded artifacts
- tool configuration
- partially completed outputs
- chat and tool-call history
That is why agent infrastructure quickly runs into a storage and continuity problem that ordinary stateless services can ignore.
In practice, hybrid becomes attractive as soon as you want all of the following:
- interactive feel
- lower steady-state cost than keeping everything always on
- resumability
- continuity across sessions
Axis 2: Where the Claude Runtime Actually Runs#
Now we move to the second axis.
This is not about lifecycle. It is about boundary placement.
There are two broad patterns.
Simple Mode#
In the simplest architecture, your application code and the Claude runtime live in the same execution environment.
That usually means:
- your service starts
- the SDK is available in the same container or VM
- Claude runs there directly
Conceptually:
This is the easiest thing to build.
It is often the right choice when:
- you are prototyping
- you have one tenant or a small number of trusted workloads
- you want the minimum number of moving parts
The tradeoff is that control logic and execution logic are tightly coupled. The place that receives traffic is also the place that hosts the agent runtime.
Control and Execution Separated#
In a more split architecture, your application acts as the control plane, while the Claude runtime executes somewhere else.
That "somewhere else" might be:
- a container
- a VM
- a remote sandbox
- a dedicated worker runtime
Conceptually:
This is the pattern where spawnClaudeCodeProcess becomes relevant.
The point of this model is not just indirection for its own sake. It creates a clearer boundary between:
- the system that decides what work to run
- the system that actually runs the work
That matters for:
- multi-tenant systems
- isolation
- scheduling
- auditability
- network controls
- secret handling
- platform ownership boundaries
What spawnClaudeCodeProcess Actually Changes#
spawnClaudeCodeProcess is easy to misread because it sounds like a deployment mode. It is not.
In Anthropic's TypeScript Agent SDK reference, it is presented as a way to launch the Claude runtime in another environment. That is exactly why it belongs to the runtime-boundary discussion rather than the lifecycle discussion.
It does not answer:
- Should this runtime be
ephemeral? - Should it be
long-running? - Should it be
hybrid?
It answers a different question:
- Should the Claude runtime be launched in the same environment as my application, or in a separate execution environment?
That means spawnClaudeCodeProcess changes the runtime boundary, not the lifecycle model.
This is the single most important distinction in this whole topic.
The Six Common Combinations#
Once you separate the two axes, you do not get three architectures. You get a matrix.
| Lifecycle mode | Simple mode | Control and execution separated |
|---|---|---|
ephemeral | Fresh app-local runtime per task | Fresh remote execution runtime per task |
long-running | App-local runtime stays up | Remote execution runtime stays up |
hybrid | App-local runtime can restart while state persists | Remote execution runtime can restart while state persists |
Some combinations are more common than others.
ephemeral + simple mode#
This is a good prototype architecture.
You keep everything together, launch a runtime for a task, and throw it away afterward. It is easy to reason about and easy to delete.
long-running + simple mode#
This is common for internal tools and small agent servers.
It is operationally straightforward early on, but over time it tends to mix app concerns and execution concerns into one place.
ephemeral + separated execution#
This is a good fit for highly isolated job-style systems.
The control plane remains stable while each task gets a fresh remote runtime.
hybrid + separated execution#
This is one of the most interesting patterns for serious agent platforms.
It gives you:
- a clean control boundary
- resumable execution state
- more control over cost than pure long-running
- better isolation than putting everything in the same app container
That combination is often where sandbox-based agent platforms become especially compelling.
How to Choose#
If you only need a rough rule of thumb, use this one:
| Situation | Usually best starting point |
|---|---|
| One-shot jobs, evals, isolated tasks | ephemeral |
| High-frequency interaction with low latency | long-running |
| Users return later and expect continuity | hybrid |
| Small prototype or internal tool | simple mode |
| Multi-tenant platform or strong isolation boundary | control and execution separated |
Another way to say it:
- choose
ephemeral,long-running, orhybridbased on workload continuity - choose simple mode or separated execution based on system boundary design
What This Means for Sandbox Runtimes#
Once a Claude-based agent moves beyond a toy deployment, the runtime underneath it starts to matter a lot.
If you want to support these deployment models cleanly, the underlying sandbox runtime usually needs more than "run a container and exec a command."
It needs to handle concerns like:
- persistent workspace state
- resumable session state
- predictable startup latency
- isolated execution boundaries
- network control
- secret handling outside untrusted agent code
- a clean distinction between control and execution
Those requirements are not unique to Anthropic. They are the natural infrastructure consequences of giving agents real tools, real filesystems, and real continuity.
The Short Version#
If you remember only three things, make them these:
ephemeral,long-running, andhybriddescribe lifecycle- simple mode versus separated execution describes runtime boundary placement
spawnClaudeCodeProcessis about the second, not the first
That is the conceptual model that keeps the architecture clear.
The implementation question comes afterward.
Later, we will publish a dedicated guide on how to map these patterns onto a sandbox runtime in practice.
If you want adjacent context in the meantime, read Persistent Storage for AI Agent Sandboxes for the storage layer behind resumable agent work.