GibilGibil

PR Review with Test Results

Review pull requests with actual test results, not static analysis

An agent checks out a PR branch, builds the project, runs the full test suite, and reviews with real pass/fail data, not static analysis.

Works with any MCP-capable agent: Claude Code, Cursor, Cline. Examples use Claude because that's what we test against.

With your agent (MCP)

With gibil's MCP server wired up (setup), hand your orchestrator the PR:

Review PR #42 on github.com/you/project. Check it out on a fresh VM, run the
full suite, and give me a review backed by actual test results, not a read-through.

The agent forges a box, checks out the PR ref, and runs the suite via MCP tools:

create_server({ name: "review", repo: "github.com/you/project", ttl: 30 })
vm_bash({ server: "review", command: "git fetch origin pull/42/head:pr-42 && git checkout pr-42" })
vm_bash({ server: "review", command: "pnpm install && pnpm test" })   // → { exit_code: 0, stdout: "..." }
destroy_server({ name: "review" })

The review then says "I ran the tests and they pass," not "the code looks correct."

By hand (CLI)

gibil create --name review --repo github.com/you/project --ttl 30m --json
gibil run review "cd /root/project && git fetch origin pull/42/head:pr-42 && git checkout pr-42" --json
gibil run review "cd /root/project && pnpm install && pnpm test" --json
# → {"stdout": "...", "stderr": "", "exit_code": 0}
gibil destroy review

Variation: dependency upgrade impact

Forge several VMs and test multiple upgrades in parallel, and report which ones break the build. Via MCP, the orchestrator fans out create_server calls; by hand, use fleet mode:

gibil create --name upgrade --fleet 3 --repo github.com/you/project --ttl 30m --json

gibil run upgrade-1-abc "cd /root/project && npm install react@19 && pnpm test" --json
gibil run upgrade-2-abc "cd /root/project && npm install next@15 && pnpm test" --json
gibil run upgrade-3-abc "cd /root/project && npm install typescript@6 && pnpm test" --json

gibil destroy --all

Each upgrade runs on a clean server. No cross-contamination.

Why gibil

  • Evidence-based reviews: the agent ran the code, not just read it
  • Clean server: test results reflect the PR in isolation, not leftover state
  • Fan-out for comparisons: test N upgrades or N consumer projects in parallel

Set GITHUB_TOKEN to let the agent push branches and open PRs directly from the server. Private repos also need it for the initial clone. See Remote PR Workflow.

Next steps

On this page