← Back to blog

VMs vs Sandboxes vs Cloud Sessions: Picking the Right Compute for AI Agents

Three architectural shapes for the box your agent runs in. Each makes a different tradeoff.

When an AI agent (or a human) needs a Linux box somewhere to run a build, run a test suite, or run a long-lived dev session, the question isn't "which product is best" — it's "what shape of box do I need." Three shapes are on the table.

Shape 1: Microvm sandbox (E2B, Fly Sprites)

Firecracker microVMs with sub-second cold start and SDK-first APIs. E2B is used by Cursor, Perplexity, and Hugging Face to run user-submitted code snippets in isolation. Fly Sprites (launched January 2026) is a separate Fly.io product targeting persistent agent state on top of Firecracker.

Best for: Embedding code execution into a product. Sub-second cold start matters when you're spinning up hundreds of short-lived sandboxes per minute. E2B has Docker-in-Docker and SSH; the ceiling is 8 vCPU / 8 GB and the session caps at 1 hour (Hobby) or 24 hours (Pro).

Wrong for: A box that needs to live for a working day or a working week. The 24-hour cap is a hard ceiling.

Shape 2: Container sandbox (Daytona)

A managed sandbox runtime built on Docker-style containers with shared-kernel isolation. Sub-100ms cold start, indefinite session lifetime, SDK-first API plus an MCP server.

Best for: Tasks where shared-kernel isolation is acceptable, where the agent needs to run for a long time, and where SDK-first integration fits the architecture. Cold start is fast, the price is similar to E2B (~$0.17/hr at 2 vCPU / 4 GB).

Wrong for: Workloads that need stronger isolation than a container can offer, or that need an explicit resource ceiling Daytona doesn't publish.

Shape 3: Managed VM (Anthropic cloud sessions)

A 4 vCPU / 16 GB / 30 GB Ubuntu VM with Docker, PostgreSQL, Redis, and a full toolchain pre-installed. Included in the Claude subscription. Reached via Claude Code's --remote flag or the web interface.

Best for: Claude Code users whose work fits in the resource ceiling. It's free, it's already there, and the pre-installed stack covers most cases.

Wrong for: Workloads beyond 4 vCPU / 16 GB, parallel sessions beyond Anthropic's daily Routines cap, code that can't run on Anthropic infrastructure for compliance reasons, or workflows using agents other than Claude Code.

Shape 4: Full VM (Gibil)

A real Hetzner VM — dedicated kernel, public IP, Docker, SSH, real systemd, any package you need. Forged in 30–90 seconds, kept alive for as long as you set the TTL (15 minutes to 30 days, extendable), destroyed when you say. Reached via CLI, MCP, the agent skill, the VS Code extension, or a Sandcastle provider. Substrate that any agent or workflow plugs into.

Best for: Sessions that need to live for minutes to days, that need Docker services (Postgres + Redis + your app running simultaneously), that need more than 4 vCPU / 16 GB, or where the code has to stay in your own Hetzner account. Agent-agnostic — any MCP-compatible agent works.

gibil create --name workstation --repo github.com/you/project --ttl 90m
# → Your agent connects via MCP and starts working
# → Full root, Docker, SSH, any Hetzner instance size
gibil destroy workstation
# → Gone, no orphaned resources

How to choose

Your situationShape
Embedding code execution into a productMicrovm sandbox (E2B, Sprites)
Container-grade isolation is fine, indefinite sessionsContainer sandbox (Daytona)
Claude Code user, fits in 4 vCPU / 16 GBManaged VM (Anthropic cloud sessions)
Need more compute, longer lifetime, BYOC, or agent-agnosticFull VM (Gibil)

The starting point

Anthropic's cloud sessions are the right default if you use Claude Code and your work fits — they're free, they're built in, and they cover most cases.

Gibil is the box you reach for when you hit the ceiling: more compute, longer sessions, your own infrastructure, or any agent rather than one vendor's. MCP in, SSH in, Sandcastle in, the agent skill in — same box underneath.