Run Your Test Suite on a Disposable Server

We ran Zod (3,574 tests) and Fastify (2,206 tests) on ephemeral ARM servers. Here's what we learned.

Your CI pipeline takes 8 minutes. You push, wait, get a red build, fix a typo, push again. Another 8 minutes. By the time tests pass, you've lost 30 minutes to feedback loops.

What if you could run your full test suite on a fresh machine in one command, get results in your terminal, and throw the machine away?

The workflow

gibil create --name tests --repo github.com/you/project --ttl 20
gibil run tests "cd /root/project && pnpm install && pnpm test"
gibil destroy tests

That's it. A real Ubuntu server with your repo cloned, deps installed, tests executed. When you're done, the server disappears.

We benchmarked it on real projects

We picked three well-known open-source repos and ran their full test suites on gibil servers (ARM cax11, 2 vCPU, 4 GB RAM).

Repo	Tests	Passed	Test Duration	Server Boot
`lukeed/clsx`	32	32	3ms	128s
`colinhacks/zod`	3,575	3,574	86s	135s
`fastify/fastify`	2,209	2,206	120s	127s

Zod's one failure was a flaky ReDoS timeout test that's known to be slow on ARM. Fastify's three skipped tests are intentionally skipped upstream. Nothing broke because of the server.

The boot time is the cost. Around two minutes from gibil create to a server that's SSH-ready with your repo cloned and Node installed. After that, you're running on a clean machine at full speed.

Why this is better than running locally

No state pollution. Your local machine has cached node_modules, stale env vars, a Postgres container from last Tuesday. A gibil server has nothing except what cloud-init installed 90 seconds ago. If a test passes here, it's real.

No resource contention. Running Fastify's 2,209 tests takes 2 minutes and pegs 2 vCPUs the whole time. On your laptop, that's your browser stuttering. On a disposable server, you don't notice.

Reproducible from zero. Every server starts from the same Ubuntu 24.04 image. No drift, no "it worked yesterday." If it fails, you know exactly what changed — your code.

Why this is better than waiting for CI

CI is the source of truth, and it should stay that way. But CI feedback loops are slow. Push, wait for a runner, wait for checkout, wait for install, wait for tests, read the log.

With gibil, you get the same clean-environment guarantee without waiting in a queue. The server is yours the moment it boots. You can run tests, SSH in to debug failures, tweak code, and run again without rebuilding from scratch.

# Run tests, see a failure
gibil run tests "pnpm test" --json

# SSH in and debug interactively
gibil ssh tests

# Fix and re-run without re-creating the server
gibil run tests "pnpm test"

The server stays alive for the duration of your TTL. Use it like a scratch pad that happens to be a full Linux machine.

For AI agents, this is the killer use case

If you're using Claude Code, Cursor, or any coding agent with MCP support, gibil becomes the agent's remote workspace. The agent can forge a server, run commands, read results, iterate, and burn the server when done.

We tested the full loop: an agent forged a server, cloned lukeed/clsx, read the source code, wrote 5 new tests, ran the full suite, and got 37/37 passing. The agent never touched the local machine.

create_server({ repo: "https://github.com/lukeed/clsx", ttl: 15 })
vm_bash({ command: "npm test" })         → 32/32 passed
vm_write({ path: "/root/project/test/agent.js", content: "..." })
vm_bash({ command: "npm test" })         → 37/37 passed
destroy_server({ name: "..." })

The agent gets root access on a real machine. It can install packages, run Docker, start services, hit real endpoints. No sandbox restrictions, no 5-minute session limits, no 4 vCPU ceiling.

The trade-off

Boot time. Two minutes is too slow for a quick jest --watch cycle. Gibil isn't replacing your local test runner for fast iteration on a single file.

It's replacing the "let me run the full suite before I push" step. The "does this actually work on a clean machine" check. The "run tests across 4 servers in parallel to cut CI time" pattern.

# Parallel test sharding across 4 servers
gibil create --fleet 4 --name shard --repo github.com/you/project --ttl 20
gibil run shard-1 "pnpm test -- --shard=1/4" &
gibil run shard-2 "pnpm test -- --shard=2/4" &
gibil run shard-3 "pnpm test -- --shard=3/4" &
gibil run shard-4 "pnpm test -- --shard=4/4" &
wait
gibil destroy --all

Four servers at ~$0.007/hr each. Your full suite runs in a quarter of the time for less than a penny.

Try it

npm install -g gibil
gibil init
gibil create --name test-run --repo github.com/you/project --ttl 15
gibil run test-run "pnpm test"
gibil destroy test-run

Your test suite. A fresh server. Real results. Then it's gone.