Sandcastle, on Gibil: AFK Agents on Real Disposable Servers

A new Sandcastle sandbox provider that runs your Claude Code agents on fresh Hetzner servers instead of local Docker. Same orchestration, different box.

You kick off a Sandcastle run before bed. Claude Code is going to refactor a chunk of your auth layer, run the test suite, and merge the result onto a branch you'll review in the morning. Five minutes in, the agent decides it needs to bring up a Postgres container to run integration tests. Docker spins up. The build pulls 800 MB of dependencies. Your laptop fan does its impression of a hairdryer. You close the lid.

You wake up to a half-finished diff, a hung Docker network, and the agent quietly stuck because port 5432 was already taken on your machine. The orchestration was fine. The box was the problem.

This post is about that second part — the box — and a small package we just shipped that swaps it out.

Sandcastle does the orchestration. Gibil does the box.

Sandcastle is Matt Pocock's TypeScript library for running AI coding agents in isolated sandboxes. It's good at the things gibil deliberately doesn't do: an iteration loop with a completion signal, a branch strategy that knows how to merge commits back, lifecycle hooks for pnpm install before the agent starts, idle timeouts when an agent goes quiet. It's the missing piece for letting an agent actually run AFK.

What Sandcastle delegates is the box itself. Where does the agent actually run? Out of the box you get Docker, Podman, Vercel sandbox, and Daytona. All four are reasonable. None of them gives you a full VM with own kernel, full root, and BYOC — your code on your account.

Gibil is a real Linux server. Forged on Hetzner, gone when the TTL expires, costs a fraction of a cent for the agent run you just kicked off. We packaged it as a Sandcastle provider:

npm install --save-dev @gibil/sandcastle-provider @ai-hero/sandcastle

- import { docker } from "@ai-hero/sandcastle/sandboxes/docker";
+ import { gibil } from "@gibil/sandcastle-provider";

  await run({
    agent: claudeCode("claude-opus-4-7"),
-   sandbox: docker(),
+   sandbox: gibil({ ttl: "30m" }),
    promptFile: ".sandcastle/prompt.md",
  });

That's the whole integration. Same run(), same hooks, same agent loop. The agent stops running on your laptop and starts running on a fresh server that vanishes when it's done.

The bind-mount versus isolated cut

Sandcastle's provider model has two shapes, and the difference matters more than it looks.

A bind-mount provider — Docker, Podman — runs the agent inside a container, but the container shares parts of your host filesystem. That's how the worktree gets in and how commits get back: it's literally your directory, just visible from a different process. Fast (no copying), elegant for the local case, but the host shows through. The agent's pnpm install populates ~/.npm. Its containers leave images on your daemon. A docker compose down without the right flags leaves volumes behind.

An isolated provider — Vercel sandbox, and now gibil — has no shared filesystem with the host. The worktree gets uploaded into the sandbox before the run. Files come back via explicit copyFileOut. When the sandbox is destroyed, it's actually destroyed: no images, no volumes, no leftover state on your machine.

For a 30-second lint or a unit test, bind-mount is probably what you want. Local, instant, fine. For an AFK agent that you're going to leave running for 20 minutes while it does real work — installs dependencies, runs Docker-in-Docker, builds binaries, occasionally breaks something — the isolated shape is what actually matches the threat model.

Sandcastle calls this seam createIsolatedSandboxProvider. The contract is small: implement create, exec, copyIn, copyFileOut, close, return a handle. The provider is roughly 200 lines of TypeScript and shells out to gibil, ssh, scp, and tar — no ssh2 library, no execa, just system tools.

A real run, end to end

Install gibil and configure your Hetzner token if you haven't already:

npm install -g gibil
gibil init

Then in your project:

mkdir -p .sandcastle
npx sandcastle init
npm install --save-dev @gibil/sandcastle-provider

Edit .sandcastle/main.ts:

import { run, claudeCode } from "@ai-hero/sandcastle";
import { gibil } from "@gibil/sandcastle-provider";

await run({
  agent: claudeCode("claude-opus-4-7"),
  sandbox: gibil({ ttl: "30m" }),
  promptFile: ".sandcastle/prompt.md",
  hooks: {
    sandbox: {
      onSandboxReady: [{ command: "pnpm install" }],
    },
  },
});

Run it:

npx tsx .sandcastle/main.ts

What happens:

The provider calls gibil create --json. A fresh Ubuntu server is forged on Hetzner. Takes about 30 to 60 seconds depending on location and whether you passed a --repo.
Sandcastle uploads your local worktree to /root/workspace on the server (a tar pipe through ssh, so directories preserve permissions).
The onSandboxReady hook runs pnpm install on the server.
The agent starts. Sandcastle streams the agent's output back, line by line, via the SSH connection. Tool calls, edits, test runs — you see them as they happen.
The agent emits its completion signal. Sandcastle commits the changes onto your configured branch strategy.
The provider calls gibil destroy --json. The server is gone. No leftover Docker images, no node_modules taking up space, no orphaned VPS billing you next month.

If anything fails midway, the close() hook still runs — Sandcastle wraps everything in Effect.ensuring, so a thrown exception during the agent loop still tears down the server. You don't end up with a forgotten machine.

Why the box matters once the agent gets ambitious

The pitch that's always made for ephemeral environments is "clean every time." That part is real, but it understates the case for AFK agents specifically.

When you're driving the agent yourself, you notice when something goes sideways. The agent tries to delete a file outside the project, you stop it. The agent's docker run collides with a service you have running locally, you see the error and intervene. Your attention is the safety net.

When you're asleep, your attention isn't doing anything. The agent's blast radius is whatever the sandbox lets it touch. With a bind-mount provider, that's whatever's reachable through the bind — your home directory, your Docker daemon, your loopback ports. With an isolated server, it's a server. When the fire goes out, the salamander moves on. There's nothing left to clean up because the thing the agent was operating on doesn't exist anymore.

This is the part that's hard to feel until you've watched an agent confidently run rm -rf node_modules && pnpm install in the wrong directory at 3am. After that, "fresh server every time" stops sounding like marketing copy.

Reuse one server across multiple runs

Cold starts matter. A fresh Hetzner server takes 30 to 90 seconds to be ready. If you run an implement-then-review workflow, you don't want to pay that twice.

Sandcastle has a createSandbox API for exactly this — create the sandbox once, run multiple agents inside it. Works the same with gibil:

import { createSandbox, claudeCode } from "@ai-hero/sandcastle";
import { gibil } from "@gibil/sandcastle-provider";

await using sandbox = await createSandbox({
  branch: "agent/fix-payments-flow",
  sandbox: gibil({ ttl: "1h" }),
  hooks: {
    sandbox: {
      onSandboxReady: [{ command: "pnpm install" }],
    },
  },
});

const impl = await sandbox.run({
  agent: claudeCode("claude-opus-4-7"),
  promptFile: ".sandcastle/implement.md",
  maxIterations: 5,
});

const review = await sandbox.run({
  agent: claudeCode("claude-sonnet-4-6"),
  prompt: "Review the changes on this branch and fix any issues.",
});

Both runs hit the same server. pnpm install runs once. The dependency cache, the build artifacts, the .next/ directory — all there for the second run. Commits accumulate on agent/fix-payments-flow. The await using declaration calls sandbox.close() automatically when the block exits, which calls gibil destroy for you.

This pattern also makes it cheap to chain more agents. An implement step, a review step, a test-fixing step, a final lint pass — four agents on the same server, one provisioning cost, one teardown.

What it costs

Hetzner billing applies. The default ARM type (cax11) is the cheapest — see the configuration docs for the full pricing table. A typical 5-to-10-minute agent run is well under a cent. With ttl: "30m" set, you're capped at the 30-minute equivalent of whichever server type you picked, even if the agent loop hangs.

Authenticated gibil users (gibil auth login) automatically get plan limits and usage tracking layered on top.

gibil list

NAME                  IP              TTL REMAINING    STATUS
agent-fix-payments    65.108.xxx.xxx  18m              running

If the orchestration crashes in a way that bypasses cleanup, gibil list is how you find the orphan. gibil destroy --all deals with it.

What's still rough

Cold start. A fresh server takes 30 to 90 seconds. Docker locally takes 1 to 2. If your workflow is rapid one-shot agent runs, a local provider is faster. If you're running long sessions, AFK work, or multi-agent pipelines, the cold start amortizes.

stderr is buffered, not streamed. Sandcastle's onLine callback fires for stdout in real time. We capture stderr in full and return it in the ExecResult, but it doesn't stream. This matches the Vercel reference provider's behavior. If you rely on watching stderr live to debug a hung agent, this is a known sharp edge.

interactiveExec isn't implemented yet. Sandcastle's optional interactive method (for TUI-style agents) is on the to-do list. If you need a shell into a running server, gibil ssh <name> works directly today.

The host needs ssh, scp, and tar on $PATH. Preinstalled on macOS and most Linux. Windows users need WSL. We chose system tools over an ssh2 library dependency to keep the integration small and easy to audit.

What's next

The provider is at v0.1. The seam works, the lifecycle is clean, the integration is a 200-line file you can read in five minutes. The roadmap is straightforward: stream stderr line-by-line, implement interactiveExec, add a --snapshot option that uses gibil's upcoming server snapshots to drop cold-start time below 15 seconds. None of those change the public API.

If you want to use it now:

npm install -g gibil
gibil init
npm install --save-dev @gibil/sandcastle-provider @ai-hero/sandcastle

Then write a .sandcastle/main.ts, point sandbox at gibil(), and you're done.

The integration source is in integrations/sandcastle/ inside the gibil repo — copy it, vendor it, file PRs. The full guide is at /docs/guides/sandcastle. Sandcastle's own docs are at github.com/mattpocock/sandcastle.

Sandcastle handles the orchestration we'd never have built ourselves. Gibil handles the box that everyone keeps trying to fake with containers. Together they let your agent run somewhere that isn't your laptop, on infrastructure you actually trust, for less than a cent a run.

Forge. Run. Burn.