Why AI Agents Need Servers, Not Sandboxes

Sandboxes cap at 4 vCPU with no root and limited sessions. Agents doing real work need full machines.

Your AI agent just cloned a monorepo, installed 400 npm packages, started Postgres, ran 2,000 tests, and pushed a fix to GitHub.

Try doing that in a sandbox.

The sandbox ceiling

Most agent compute platforms give you a container dressed up as something fancier. They call it a sandbox, a microVM, a secure execution environment. The constraints are the same:

4-8 vCPU cap. Your agent can't parallelize a heavy build.
No root access. Can't install system packages, configure services, or touch systemd.
No Docker. The agent runs inside a container — it can't run containers itself.
No SSH. Can't drop in to debug when something goes wrong.
Session limits. 1-24 hours, then everything is gone.

These limits exist because the platform shares infrastructure across tenants. Root would be a security disaster. Docker-in-Docker would be a resource nightmare.

For simple tasks — run a script, check an output — sandboxes work. But agents are getting more capable, and their tasks are getting heavier.

What agents actually do

An AI coding agent doesn't just execute a script. It:

Clones your repo — needs git, SSH keys, maybe a GitHub token
Installs dependencies — npm, pip, apt packages, system libraries
Starts services — Postgres, Redis for integration tests
Builds the project — webpack, cargo, go build — CPU and memory hungry
Runs tests — sometimes thousands, sometimes in parallel
Pushes code — git commit, git push, opens a PR

Each step is a system-level operation. Not a function call. A real process running on a real machine.

The server advantage

A full server gives the agent the same environment a developer has:

gibil create --name agent-task --repo github.com/you/project --ttl 60

Now the agent has:

Root access — install anything, configure anything
Real networking — a public IP, real ports
Docker — run services alongside the code
SSH — the agent (or you) can drop in anytime
No resource caps — pick the server size you need

The agent works on a real machine. When it's done, the machine disappears. No cleanup, no stale state, no forgotten servers running up a bill.

The tradeoff: boot time

Sandboxes are faster to start. A Firecracker microVM boots in ~150ms. A gibil server takes 30-120 seconds.

For a 5-second script execution, the sandbox wins. For a 30-minute test-fix-test cycle, boot time is noise.

The question isn't "which boots faster." It's "what does the agent need to do?" If the answer involves Docker, SSH, root, or serious CPU — the sandbox ceiling will hit before the task finishes.

When to use what

Use a sandbox when:

The task is a single script execution (under 5 minutes)
No Docker, no system packages, no services needed
Latency matters more than capability

Use a server when:

The agent needs to build, test, or deploy
Docker services are part of the workflow
The task runs for more than a few minutes
You need SSH for debugging
The agent needs root

Most AI agent workloads — code generation, test execution, CI, infrastructure testing — are server workloads. The tooling is catching up.