Parallel Test Sharding
Split a test suite across multiple servers with fleet mode
Split a large test suite across multiple servers. Each shard runs independently. Results come back as structured data.
Drive it with any MCP-capable agent or by hand with fleet mode. Examples use Claude because that's what we test against.
With your agent (MCP)
With gibil's MCP server wired up (setup):
Split my test suite across 8 VMs, run shard N/8 on each, and aggregate the
results. Tell me which shard failed and burn the boxes when done.The orchestrator forges the boxes (in parallel), dispatches each shard in the background, then polls:
// forge shard-1..shard-8, all at once
create_server({ name: "shard-1", repo: "github.com/you/project", ttl: 30 })
// ...shard-2 through shard-8...
// dispatch each shard, collect job IDs
vm_bash({ server: "shard-1", command: "pnpm test -- --shard=1/8", background: true }) // → { job_id: "j-..." }
// ...shard-2 through shard-8...
// poll each job to completion, then aggregate
vm_job_status({ job_id: "j-..." })
destroy_server({ name: "shard-1" }) // ...and the restBy hand (CLI)
Fleet mode (--fleet) forges N servers in one command:
gibil create --name shard --fleet 8 --repo github.com/you/project --ttl 30m --json{
"fleet_id": "fleet-a1b2c3d4",
"instances": [
{ "name": "shard-1-a3f", "ip": "65.21.x.x" },
{ "name": "shard-2-a3f", "ip": "65.21.x.y" }
]
}# run shards in parallel
gibil run shard-1-a3f "cd /root/project && pnpm test -- --shard=1/8" --json
gibil run shard-2-a3f "cd /root/project && pnpm test -- --shard=2/8" --json
# ... all 8 in parallel ...
# aggregate results, burn the fleet
gibil destroy --allReal numbers
A test suite that takes 40 minutes on one machine finishes in ~5 minutes across 8 shards, plus ~90 seconds for server boot.
Eight cax11 servers on Hetzner cost ~$0.06/hr total with BYOC. Fleet caps at 20 per command. For larger shards, run two fleet commands.
Why gibil
- Clean state per shard: no shared state, no test pollution between shards
- Parallel creation: all servers forge simultaneously, not sequentially
- Structured output: parsed results on every MCP call and
--jsonon every command - Cheap at scale: BYOC, you pay your cloud directly. On Hetzner cax11 the cost is pennies; on Vultr you trade some price for APAC proximity.
Fleet mode also works for benchmarking multiple configurations, testing dependency upgrades in parallel, or any workflow where you need N isolated environments at once.
Next steps
- AI Agent via MCP: wire up the MCP server
- Fleet Mode Guide: deep dive on fleet workflows
- Run Agents in Parallel: N branches, N agents at once
- PR Review: test multiple dependency upgrades
New to Vultr? Get $300 in free credits — Referral link — Gibil gets a kickback that helps fund development.