GibilGibil

Agent Code-Test Loop

An AI agent writes code, runs tests, reads failures, fixes, and repeats — on a real server

An AI agent writes code, runs tests, reads failures, fixes code, repeats. Five to twenty iterations on a real server until everything passes.

This is the core use case. It works today with zero caveats.

Workflow

# Forge a server with your repo cloned and ready
gibil create --name pr-42 --repo github.com/you/project --ttl 60

# Agent loops: run tests → parse output → fix code → repeat
gibil run pr-42 "cd /root/project && pnpm test" --json
# → {"stdout": "3 failed, 39 passed", "stderr": "", "exit_code": 1}

# Agent fixes code via MCP or gibil run, then runs again
gibil run pr-42 "cd /root/project && pnpm test" --json
# → {"stdout": "42 passed", "stderr": "", "exit_code": 0}

# Done — burn it
gibil destroy pr-42

What the agent gets

A real Linux server with:

  • Your repo cloned to /root/project
  • Runtime installed (Node, Python, or Go via .gibil.yml)
  • Root access — no permission issues
  • SSH for the full TTL window

The --json flag returns structured output the agent can parse without regex:

{
  "stdout": "3 failed, 39 passed",
  "stderr": "",
  "exit_code": 1
}

Why gibil

  • Long-lived session — the server stays up for the full TTL, not just one command
  • Clean state — fresh Ubuntu 24.04, no leftover artifacts from previous runs
  • Machine-readable--json on every command
  • Auto-cleanup — TTL burns the server when the agent is done (or forgets)

For MCP-native agents (Claude Code, Cursor, or any MCP-compatible agent), pair this with MCP mode — the agent gets typed tools (vm_bash, vm_write) instead of shell strings.

Next steps

On this page