Back to blog

How We Sandbox AI Agents in Production

Matt Greathouse / A.I. Christianson

A production-grade, security-first system for running AI agents: per-agent isolation, proxy-only egress, deterministic filespace sync, and auditable execution.

Isolation, proxy-only egress, filespace sync, and auditability at scale.

TL;DR

Running AI agents in production means untrusted code, real files, and real network access. We built a sandboxed compute system with per-agent isolation, NetworkPolicy-enforced egress, deterministic filespace sync, strict timeouts, and full audit trails. This post walks through the threat model, the architecture, and the mechanics of how we keep powerful agent capabilities safe in production. If you want to see the real implementation, the OSS code is here:

https://github.com/gobii-ai/gobii-platform

We also publish the minimal sandbox compute supervisor used inside the pods:

https://github.com/gobii-ai/sandbox-compute-server

1) What “agents in prod” actually means

The moment agents touch the real world, the threat surface explodes: untrusted code, arbitrary URLs, filesystem mutation, and long‑lived state. If you treat that like a standard container workload, you’re betting your infrastructure on “nothing ever goes wrong.” The core security problem is not a single exploit. It is a chain of small weaknesses: weak isolation, leaky egress, non‑deterministic filesync, and missing audit trails.

Security here is not a single feature. It’s a system. An isolation boundary without network controls is insufficient. Network controls without audit logs are insufficient. Sync without determinism is insufficient. Every layer has to hold.

2) Constraint stack (why this is hard)

These were the non‑negotiables for production:

  • Per‑agent isolation
  • Proxy‑only egress, fail‑closed
  • Deterministic filespace sync
  • Strict timeouts and stdout/stderr caps
  • Full audit trail
  • Cost/latency ceilings that still let the product scale

If you get one of these wrong, the system becomes unsafe or unusable. The difficult part is not building any single component. It is making the whole stack hold under adversarial inputs.

3) Why give agents real capabilities at all?

“Demo agents” can do toy tasks. Production agents need real capabilities:

  • Full browser automation for real websites (auth, JS, dynamic flows)
  • MCP tools and MCP servers for real integrations
  • File manipulation for durable outputs and handoff
  • Code execution for data transformation and automation

We don’t grant these capabilities for novelty. We grant them because real work requires them, then we put them inside a sandbox by default.

Flowchart showing production tasks requiring browser, MCP, files, and code capabilities that route into a sandbox with guardrails.
Capabilities are powerful by necessity; safety comes from the sandboxed boundary and guardrails.
def route_capability(capability: str) -> str:
    if capability in {"browser", "files", "code_exec", "mcp_server"}:
        return "sandbox"
    return "trusted"

4) Threat model + design goals

We explicitly designed against the risk classes that appear in modern agent systems:

  • Prompt injection and indirect prompt injection
  • Tool abuse and excessive agency
  • Data exfiltration via outbound network access
  • Lateral movement through external services

The security model therefore focuses on:

  • Isolation: agent code never runs in the trusted worker process
  • Egress control: proxy‑only egress, enforced by policy
  • Deterministic sync: file conflicts resolve in a predictable way
  • Auditability: every tool call is logged with a params hash

These risks are documented in LLM security frameworks. OWASP's LLM Top 10 lists prompt injection and excessive agency, and MITRE ATLAS highlights real-world prompt injection and data exfiltration patterns. That is the exact surface we constrain. OWASP LLM Top 10, MITRE ATLAS.

5) Architecture overview

We split the system into a trusted control plane and untrusted per‑agent compute. Kubernetes gives us the orchestration surface; a sandboxed runtime provides the kernel boundary. In GKE, sandboxed pods are requested via runtimeClassName: gvisor. GKE sandbox pods, RuntimeClass.

Architecture diagram showing control plane, per-agent sandbox pod, egress proxy, filespace, and metadata database.
Control plane orchestrates sessions and sync; per‑agent pods execute untrusted work behind an egress proxy.

Control plane orchestrates sessions, selects proxies, and syncs files.
Sandbox pods execute tools, code, and MCP servers.
Egress proxy pods are the only allowed route to the internet.

6) Execution path (tool call end‑to‑end)

When an agent calls a tool, the control plane ensures a session exists, routes the call into the sandbox, and syncs workspace changes back to the filespace. The same flow is used for run_command, python_exec, file creation, and MCP tool execution.

Sequence diagram showing agent tool request routed through control plane to sandbox supervisor, tool execution, and optional filespace sync.
All tool execution happens in the sandbox, with optional filespace sync on completion.
def execute_tool(agent, tool_name, params):
    session = ensure_session(agent)
    result = sandbox.tool_request(session, tool_name, params)
    if result.ok and sync_on_tool_call:
        sync_filespace_push(agent, session)
    return result

Tool execution is bounded by timeouts and stdout/stderr caps to prevent resource exhaustion. In our system those limits are centralized and enforced at the sandbox boundary.

7) Isolation boundary: gVisor userspace‑kernel sandbox

We use a userspace kernel boundary so that no system call is passed through directly to the host kernel. In gVisor, the Sentry intercepts syscalls and the Gofer mediates filesystem access, sharply reducing host kernel exposure. gVisor overview, gVisor security model.

apiVersion: v1
kind: Pod
metadata:
  name: sandbox-agent-<agent_id>
spec:
  runtimeClassName: gvisor
  serviceAccountName: sandbox-sa
  containers:
    - name: sandbox-supervisor
      image: sandbox-supervisor:latest
      securityContext:
        allowPrivilegeEscalation: false
        runAsNonRoot: true
        capabilities:
          drop: ["ALL"]

We also apply RuntimeDefault seccomp profiles to reduce syscall surface for the pod. Kubernetes seccomp.

8) Network egress: policy‑enforced, fail‑closed

Our network model is simple: sandbox pods can only talk to the per‑agent egress proxy. Everything else is denied by policy. This is enforced by Kubernetes NetworkPolicy, which implements default‑deny egress with explicit allow rules. NetworkPolicy.

Network flow diagram showing sandbox pod allowed to reach egress proxy and denied direct internet egress.
Egress is policy‑enforced: sandbox pods can only reach the per‑agent proxy.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: sandbox-egress-only
spec:
  podSelector:
    matchLabels:
      app: sandbox-agent
  policyTypes: [Egress]
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: sandbox-egress-proxy

DNS resolution still works because we explicitly allow egress to kube‑dns/coredns (TCP/UDP 53); everything else is denied.

Because egress is default‑deny, direct access to metadata endpoints is blocked. The proxy is the only path out.

9) MCP servers run inside the sandbox

User/org MCP servers run inside the sandbox pod alongside sandboxed tools. Platform MCP servers remain in the trusted worker process. This cleanly splits the untrusted extension surface from the trusted core.

10) Filespace sync: deterministic, conflict‑safe

We treat the filespace as a shared state layer with last‑writer‑wins conflict resolution. If the agent’s workspace changes after the last sync timestamp, it wins; otherwise the filespace wins. Deletions propagate only if they’re newer than the last known file version. We also normalize paths and disallow traversal to keep the workspace boundary intact.

Filespace sync diagram showing push, pull, and last-writer-wins conflict resolution.
Sync is deterministic: last‑writer‑wins on conflicts.
def push_sync(agent, session, since):
    changes = scan_workspace_changes(since)
    response = sandbox.sync(direction="push", changes=changes)
    apply_filespace_push(agent, response.changes, response.sync_timestamp)

We hard‑cap workspace size and fail early if a write exceeds the limit. That prevents a single agent from consuming unbounded storage.

11) Session lifecycle (warm → idle → snapshot → resume)

Sandbox sessions are long‑lived, but only while they’re active. When idle:

  1. Sync workspace to filespace
  2. Snapshot disk
  3. Stop pod

On resume:

  1. Restore snapshot
  2. Pull filespace
  3. Start supervisor
Lifecycle diagram showing deploy, idle TTL, sync, snapshot, stop, and resume path.
Idle sessions snapshot and stop; resume restores state and syncs back in.

12) Security invariants (non‑negotiables)

  • No direct egress from sandbox pods
  • No privileged containers
  • All tool calls logged with params hashes
  • Workspace size hard cap
  • No GPUs/TPUs are exposed to agent code, by design

Inference happens outside the cluster, so sandbox workloads remain CPU‑only. That shrinks the hardware attack surface while keeping agent execution simple and auditable.

13) Auditability and forensic traceability

We log tool invocations with a deterministic hash of parameters to preserve auditability without leaking secrets. That gives us a durable, queryable event stream for incident response.

Sandbox tool_request agent=<id> tool=<name> params_hash=<sha256> duration_ms=<n> exit_code=<n>

14) External dependencies are still part of the perimeter

Any external service the sandbox can reach becomes part of your security boundary (databases, APIs, storage drivers). We minimize those surfaces and apply least‑privilege policies, because isolation is meaningless if the dependencies are wide‑open.

15) Failure modes and edge cases

We design for failure explicitly:

  • Proxy outage → fail‑closed, no direct egress
  • Sync conflicts → deterministic last‑writer‑wins resolution
  • Large outputs → stdout/stderr caps
  • Path traversal attempts → rejected at the workspace boundary

These are the cases that quietly break naive sandbox designs.

16) Tradeoffs

Sandboxing isn’t free:

  • Syscall‑heavy workloads cost more (userspace kernel overhead)
  • Privileged workloads aren’t compatible
  • Certain kernel features are unavailable by design

We accept those tradeoffs intentionally, because the alternative is an unbounded attack surface. gVisor is explicit about this tradeoff profile and where it is (and isn’t) the right boundary. gVisor docs.

17) What’s next

  • Tighter policy enforcement
  • Faster resume paths
  • Deeper audit trails

References

  1. GKE sandbox pods (runtimeClassName: gvisor):
    https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
  2. gVisor overview (userspace kernel model):
    https://gvisor.dev/docs/
  3. gVisor security model (no syscalls passed through directly):
    https://gvisor.dev/docs/architecture_guide/security/
  4. Kubernetes RuntimeClass (per‑pod runtime selection):
    https://kubernetes.io/docs/concepts/containers/runtime-class
  5. Kubernetes NetworkPolicy (default‑deny egress model):
    https://kubernetes.io/docs/concepts/services-networking/network-policies/
  6. Kubernetes seccomp (RuntimeDefault profiles):
    https://kubernetes.io/docs/reference/node/seccomp/
  7. OWASP Top 10 for LLM Applications (prompt injection, excessive agency, plugin risks):
    https://owasp.org/www-project-top-10-for-large-language-model-applications/
  8. MITRE ATLAS / Generative AI security risks (real‑world AI attack cases):
    https://www.mitre.org/news-insights/news-release/mitre-and-microsoft-collaborate-address-generative-ai-security-risks
  9. Diagram tooling: Mermaid (also backed by Open Core Ventures, like Garak):
    https://mermaid.ai/
  10. Open Core Ventures (background reference):
    https://www.opencoreventures.com/
  11. Minimal sandbox compute server (pod supervisor):
    https://github.com/gobii-ai/sandbox-compute-server

More from the Gobii blog