GibilGibil

Parallel Test Sharding

Split a test suite across multiple servers with fleet mode

Split a large test suite across multiple servers. Each shard runs independently. Results come back as structured data.

Drive it with any MCP-capable agent or by hand with fleet mode. Examples use Claude because that's what we test against.

With your agent (MCP)

With gibil's MCP server wired up (setup):

Split my test suite across 8 VMs, run shard N/8 on each, and aggregate the
results. Tell me which shard failed and burn the boxes when done.

The orchestrator forges the boxes (in parallel), dispatches each shard in the background, then polls:

// forge shard-1..shard-8, all at once
create_server({ name: "shard-1", repo: "github.com/you/project", ttl: 30 })
// ...shard-2 through shard-8...

// dispatch each shard, collect job IDs
vm_bash({ server: "shard-1", command: "pnpm test -- --shard=1/8", background: true })  // → { job_id: "j-..." }
// ...shard-2 through shard-8...

// poll each job to completion, then aggregate
vm_job_status({ job_id: "j-..." })

destroy_server({ name: "shard-1" })  // ...and the rest

By hand (CLI)

Fleet mode (--fleet) forges N servers in one command:

gibil create --name shard --fleet 8 --repo github.com/you/project --ttl 30m --json
{
  "fleet_id": "fleet-a1b2c3d4",
  "instances": [
    { "name": "shard-1-a3f", "ip": "65.21.x.x" },
    { "name": "shard-2-a3f", "ip": "65.21.x.y" }
  ]
}
# run shards in parallel
gibil run shard-1-a3f "cd /root/project && pnpm test -- --shard=1/8" --json
gibil run shard-2-a3f "cd /root/project && pnpm test -- --shard=2/8" --json
# ... all 8 in parallel ...

# aggregate results, burn the fleet
gibil destroy --all

Real numbers

A test suite that takes 40 minutes on one machine finishes in ~5 minutes across 8 shards, plus ~90 seconds for server boot.

Eight cax11 servers on Hetzner cost ~$0.06/hr total with BYOC. Fleet caps at 20 per command. For larger shards, run two fleet commands.

Why gibil

  • Clean state per shard: no shared state, no test pollution between shards
  • Parallel creation: all servers forge simultaneously, not sequentially
  • Structured output: parsed results on every MCP call and --json on every command
  • Cheap at scale: BYOC, you pay your cloud directly. On Hetzner cax11 the cost is pennies; on Vultr you trade some price for APAC proximity.

Fleet mode also works for benchmarking multiple configurations, testing dependency upgrades in parallel, or any workflow where you need N isolated environments at once.

Next steps

New to Vultr? Get $300 in free creditsReferral link — Gibil gets a kickback that helps fund development.

On this page