securitycredentialegressai-agents

API Key Security for AI Agents: How to Keep Secrets Out of the Sandbox

Sandbox0 Team·

Every AI agent that calls an external API needs credentials. GitHub tokens, OpenAI keys, database passwords — something has to authenticate the request. The question isn't whether your agent needs credentials. It's whether the agent itself needs to hold them.

Most setups answer that question the easy way: pass the key as an environment variable or inject it at startup. The agent reads it, uses it, and the key is now living inside an agent-controlled process. Whatever code runs in that sandbox — including every tool call, every model output, every prompt that arrives from the outside — has access to it.

Why Giving Agents Your API Keys Is a Problem#

The risks with credentials in the agent environment compound quickly:

Prompt injection can exfiltrate keys directly. An attacker who controls content the agent processes — a document, a web page, a tool result — can craft instructions that cause the agent to include the credential in its output, pass it to another tool, or log it somewhere observable.

Credentials appear in places you don't expect. Debug output, observability traces, error messages — all of these can capture environment variables. Keys passed this way often end up in third-party platforms before anyone notices.

Blast radius is hard to contain. If an agent is compromised, the key it holds may have permissions far beyond what that task required. Rotating it means finding every agent that holds a copy.

Rotation is a coordination problem. Keys stored in sandbox environments don't rotate automatically. If you have a pool of pre-warmed sandboxes holding a credential in memory, updating that credential is a deployment event, not a configuration change.

The Current Best Practice: Phantom Tokens#

The community has converged on a pattern called the Phantom Token approach (also called a local credential proxy). Here's how it works:

  1. The agent starts with a fake, random token — a long hex string that looks like a real API key
  2. The agent's SDK is redirected to a localhost proxy via a base URL environment variable
  3. When the agent makes a request, it goes to the local proxy with the phantom token
  4. The proxy validates the token, strips it, injects the real credential, and forwards the request upstream

This is genuinely better than giving agents raw credentials. The real key never appears in the agent's environment variables or process memory.

But the phantom token approach has a constraint: the agent code must know to connect to the proxy. This typically means redirecting the SDK's base URL (OPENAI_BASE_URL=http://localhost:8080/v1, for example). That works for HTTP APIs that support base URL overrides — it doesn't work transparently for every protocol, and it requires per-agent configuration that a compromised agent could in principle bypass.

Credential Injection at the Network Layer#

Sandbox0 takes a different approach. Instead of running a proxy that the agent connects to, credentials are injected by netd — a node-level daemon that manages network policy for every sandbox on the host.

Here's what happens when an agent makes an authenticated API call:

  1. The agent sends a plain HTTP request to api.github.com with no Authorization header
  2. iptables TPROXY rules redirect the connection to netd before it leaves the node
  3. netd matches the destination against the credential rules configured for that sandbox
  4. netd resolves the bound credential source, renders the header template, and injects the Authorization header into the outbound request
  5. The authenticated request is forwarded to the real destination

The agent never wrote an auth header. The agent never held a token. The agent doesn't know a proxy is involved. The credential was injected entirely at the kernel network layer, transparent to the process.

This is the difference from the phantom token pattern: there is no fake credential to steal. There is nothing in the agent's process environment that represents the real key, or a stand-in for it.

What the Configuration Looks Like#

Define a credential source once — outside any sandbox:

yaml
# Create the source (this is the only place the real token lives) s0 credential source create github-source \ --type static_headers \ --value token=ghp_yourtokenhere

Then configure the sandbox network policy to bind and apply it:

yaml
mode: block-all egress: trafficRules: - name: allow-github-api action: allow domains: - api.github.com ports: - port: 443 protocol: tcp credentialRules: - name: github-auth credentialRef: gh-token protocol: https domains: - api.github.com ports: - port: 443 protocol: tcp failurePolicy: fail-closed credentialBindings: - ref: gh-token sourceRef: github-source projection: type: http_headers httpHeaders: headers: - name: Authorization valueTemplate: "Bearer {{token}}"

The agent code makes a plain GET https://api.github.com/repos/.... The request arrives at GitHub with a valid Authorization header. The agent process holds no token, real or phantom.

Protocol Support Beyond HTTP#

Because the injection happens at the network layer — not inside an HTTP library — it works across protocols that wouldn't be reachable from an application-level proxy.

HTTP and HTTPS: Headers are injected into the outbound request. For HTTPS, netd performs TLS terminate-reoriginate, injecting headers before re-establishing the TLS connection to the upstream.

SOCKS5: When the upstream requires username/password authentication, netd rewrites the SOCKS5 handshake — adding the username/password exchange if the client didn't request it, replacing credentials if it did.

MQTT: netd reads and rewrites the CONNECT packet, injecting username and password fields before forwarding to the broker.

Redis: netd prepends an AUTH command to the upstream connection before forwarding the client's first command. The client-side code connects with no auth; the upstream sees a properly authenticated session.

gRPC: Header injection works the same as HTTPS, since gRPC runs over HTTP/2.

The agent code connects to each service with no auth configuration. netd handles authentication per-connection, per-destination, per-protocol.

Rotation Is a Single Operation#

Because credentials live in the source — not in the sandbox — rotating a key doesn't touch any running sandbox. Update the credential source, and every sandbox that references it picks up the new value on the next resolution.

Resolved credentials are cached in memory at the netd level, keyed by sandbox, auth reference, destination, and protocol. The cachePolicy.ttl field on a binding controls how long a resolved value stays cached. A shorter TTL means rotations propagate faster; a longer TTL reduces resolution latency for high-frequency calls.

Failure Is Closed by Default#

If credential resolution fails — the source was deleted, the network to the resolver is down, permissions were revoked — the default behavior is to block the request, not pass it through unauthenticated.

This is controlled by failurePolicy: fail-closed on the credential rule. A misconfigured or missing credential produces a clear connection error, not a silent request that goes out without authentication and fails later with a confusing 401.

fail-open is available for cases where unauthenticated access is acceptable when credentials are unavailable, but fail-closed is the default for a reason: in production agent systems, an auth regression that silently degrades to unauthenticated is harder to catch than one that fails loudly.

FAQ#

How is this different from the Phantom Token Pattern?

The Phantom Token Pattern still places a credential — a fake one — in the agent's process environment, and requires the agent's SDK to connect to a localhost proxy. Sandbox0's approach uses kernel-level traffic interception (iptables TPROXY) so the agent makes a completely normal network connection to the real destination. No fake token, no localhost proxy, no per-agent SDK configuration.

Does the agent code need to be modified?

No. The agent connects to the real destination with no auth configuration. Credential injection is invisible to the agent process.

What happens if the credential source is unavailable during a request?

With the default fail-closed policy, the request is blocked and the agent receives a connection error. With fail-open, the request is forwarded without authentication.

Can I use this in self-hosted deployments?

Yes. The credential system is part of the open-source netd and manager components. It works the same way in self-hosted single-cluster deployments and in multi-cluster SaaS configurations.

How quickly do credential rotations take effect?

Immediately for new connections. Cached resolutions expire according to the cachePolicy.ttl you configure on the binding. Setting a short TTL (e.g., 1m) ensures rapid propagation; the default has no TTL and cached entries persist until the netd process restarts.


The credential system — sources, bindings, and egress auth rules — is documented in the Credential section of the Sandbox0 docs. For the network policy model that controls which traffic is allowed before credentials are considered, see Sandbox Network.