Skip to content
Markdown

Agent sandboxing and isolation

Scope: running model-generated code and tool calls without trusting them. An agent that executes code is running untrusted input by definition, so the question is which isolation boundary matches the trust level. Covers the isolation spectrum (process, container, syscall interception, microVM), where container defaults hold and where they do not, and the agent-specific pattern of reversible workspaces. Enforces the threat model and is gated by the control plane; reuses platform isolation from security and multi-tenancy.

This page describes defensive isolation. CVEs are referenced for context; configs are reference-only.

flowchart LR
  TRUST["Trust level of code"] --> PROC["Process<br/>(no isolation)"]
  PROC --> CONT["Container<br/>(namespaces, seccomp, caps, MAC)"]
  CONT --> GV["gVisor<br/>(syscall interception)"]
  GV --> VM["Firecracker / Kata<br/>(microVM)"]
  CONT -.->|"trusted workloads"| OKT["Adequate"]
  GV -.->|"untrusted code"| OKU["Adequate"]
  VM -.->|"untrusted multi-tenant"| OKU

Overview

The single worst sandboxing mistake is running agent-generated code on the host. Everything else is choosing how strong the boundary needs to be. Container defaults are stronger than commonly assumed for trusted workloads, but for untrusted multi-tenant code they are insufficient, and the "advanced" container features are one misconfiguration away from host compromise. When the code is genuinely untrusted, move up the spectrum to syscall interception or a virtual-machine boundary.1

Core knowledge

The isolation spectrum

  • Process: no isolation. Never for agent-generated code.
  • Container: Linux namespaces plus a seccomp profile, dropped capabilities, and a mandatory-access-control profile (AppArmor or SELinux). The right answer for trusted first-party workloads.
  • gVisor: a user-space kernel intercepts syscalls, so the container talks to a sandbox rather than the host kernel directly. Its Sentry needs on the order of 53 host syscalls (68 with networking) against roughly 350 for a normal container, an eighty-percent cut in kernel attack surface.2
  • microVM (Firecracker, Kata): a real but minimal VM boundary. Firecracker boots in about 125 ms with a few megabytes of overhead per microVM and is around 50k lines of Rust, against roughly two million lines of C for a general-purpose emulator, a much smaller and more auditable trusted computing base.3

Match the boundary to the trust level: containers for trusted code, gVisor or a microVM for untrusted or multi-tenant code.

Why container defaults are not enough for untrusted code

Container isolation rests on six mechanisms: mount namespaces with OverlayFS, FUSE, mount propagation modes, mandatory access control, seccomp, and cgroups. The defaults enforce in a useful order (seccomp, then capabilities, then MAC, then the kernel), and a default seccomp profile already blocks dozens of dangerous syscalls. The danger is in the features people turn on: a mounted Docker socket is a direct path to host root, bidirectional mount propagation plus an over-broad capability lets a container shadow host paths, and OverlayFS bugs have repeatedly yielded root.1

The CVE record makes the point. CVE-2023-0386 (an OverlayFS privilege escalation) reached the CISA Known Exploited Vulnerabilities catalog; the GameOver(lay) pair (CVE-2023-2640 and CVE-2023-32629) affected a large share of Ubuntu cloud workloads; and a runc flaw allowed an AppArmor bypass via a symlinked path. The operating takeaway is blunt: the shared kernel is not a security boundary for untrusted code.4

Harden what you do run

  • Run rootless where possible, drop all capabilities and add back only what is needed, and keep a restrictive seccomp and MAC profile.
  • Apply Kubernetes Pod Security Standards (the restricted profile) and reject privileged GPU containers.
  • Add Landlock for unprivileged, process-scoped filesystem restriction inside the sandbox.
  • Cut egress by default; an exfiltration channel is half of the lethal trifecta.

Reversible workspaces for agents

Beyond confining the blast radius, agents benefit from making actions undoable. Running each agent or sub-agent in a reversible workspace, for example a git worktree per task, means its filesystem changes can be inspected and discarded rather than trusted. Combined with a per-call sandbox for code execution and a hard time limit, this turns "the agent ran some code" from an irreversible event into a transaction the control plane can approve, audit, or roll back.5

Don't-miss checklist

  • Never execute agent-generated code on the host; isolate every code path.
  • Containers for trusted code; gVisor or a microVM for untrusted or multi-tenant code.
  • Never mount the container runtime socket into an agent sandbox.
  • Rootless, least-capability, seccomp and MAC profiles on; Pod Security Standards restricted.
  • Default-deny egress; time-limit and resource-cap every execution.
  • Run agents in reversible workspaces so filesystem changes can be discarded.

Failure modes

  • Host execution. Code runs outside any sandbox; one malicious tool call owns the machine.
  • Socket mount. The runtime socket is exposed to the agent, granting host root.
  • Propagation misconfig. Bidirectional mounts plus a broad capability let the container reach host paths.
  • Shared-kernel trust. Untrusted code on a plain container; a kernel CVE escapes to the host.
  • Open egress. The sandbox can reach the internet, completing an exfiltration path.
  • Irreversible actions. No reversible workspace; a bad action cannot be rolled back.

Open questions & validation

  • gVisor and microVMs add latency and some compatibility limits; measure the overhead on the target tool workload.
  • Validate the sandbox by attempting the known escape classes (socket, propagation, OverlayFS) against it in a test environment.
  • Confidential-computing isolation for agent workloads on shared GPUs is an evolving option (security and multi-tenancy).

References

  • gVisor (application kernel for containers): https://gvisor.dev/
  • Firecracker microVM: https://firecracker-microvm.github.io/
  • Firecracker: Lightweight Virtualization for Serverless Applications (NSDI 2020): https://www.usenix.org/conference/nsdi20/presentation/agache
  • Kata Containers: https://katacontainers.io/
  • Linux seccomp BPF: https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html
  • Landlock LSM: https://landlock.io/
  • Kubernetes Pod Security Standards: https://kubernetes.io/docs/concepts/security/pod-security-standards/
  • CVE-2023-0386 (OverlayFS privilege escalation): https://nvd.nist.gov/vuln/detail/CVE-2023-0386
  • CVE-2023-2640 (GameOver(lay)): https://nvd.nist.gov/vuln/detail/CVE-2023-2640

Related: Agent threat model · Prompt-injection defense · Policy, guardrails & governance · Orchestration & control plane · Security & multi-tenancy (platform) · Agentic systems


  1. Container defaults protect trusted workloads adequately but are insufficient for untrusted multi-tenant code; the six mechanisms are mount namespaces with OverlayFS, FUSE, mount propagation, MAC, seccomp, and cgroups, enforced seccomp-then-capabilities-then-MAC-then-kernel. The high-risk features are runtime-socket mounts, bidirectional mount propagation with broad capabilities, and OverlayFS bugs. 

  2. gVisor's Sentry intercepts syscalls in user space and needs roughly 53 host syscalls (68 with networking) versus about 350 for a normal container, around an eighty-percent reduction in kernel attack surface. 

  3. Firecracker boots in about 125 ms with a few megabytes per microVM and is roughly 50k lines of Rust, a far smaller trusted computing base than a general-purpose emulator. 

  4. CVE-2023-0386 (OverlayFS) reached the CISA KEV catalog; GameOver(lay) (CVE-2023-2640 / CVE-2023-32629) affected a large share of Ubuntu cloud workloads; a runc flaw enabled an AppArmor bypass via a symlinked path. The shared kernel is not a boundary for untrusted code. 

  5. Running each agent or sub-agent in a reversible workspace (for example a git worktree per task), with a per-call execution sandbox and a hard time limit, makes filesystem changes inspectable and discardable rather than trusted.