Agent interface

CooperBench supports multiple agent frameworks through a unified interface. You can use built-in agents or register custom ones.

Available agents

CooperBench includes the following built-in agents:

mini_swe_agent (default) - Lightweight SWE-agent implementation
mini_swe_agent_v2 - Enhanced version with improved tooling
swe_agent - Full SWE-agent framework
openhands_sdk - OpenHands agent SDK

Using agents

Specify agent in run()

from cooperbench import run

# Use mini_swe_agent (default)
run(
    run_name="default_agent",
    subset="lite",
    agent="mini_swe_agent",
)

# Use SWE-agent
run(
    run_name="swe_agent_test",
    subset="lite",
    agent="swe_agent",
    model_name="gpt-4o",
)

# Use OpenHands
run(
    run_name="openhands_test",
    subset="lite",
    agent="openhands_sdk",
)

List available agents

from cooperbench.agents.registry import list_agents

agents = list_agents()
print("Available agents:")
for agent in agents:
    print(f"  - {agent}")

Available agents:
  - mini_swe_agent
  - mini_swe_agent_v2
  - openhands
  - swe_agent

All agents implement the AgentRunner protocol:

class AgentRunner(Protocol):
    """Protocol for agent framework adapters."""

    def run(
        self,
        task: dict,
        image: str,
        timeout: int = 3600,
        **kwargs,
    ) -> AgentResult:
        """Run the agent on a task.

        Args:
            task: Task specification with problem_statement, feature_id, etc.
            image: Docker image for the environment
            timeout: Maximum execution time in seconds
            **kwargs: Agent-specific configuration

        Returns:
            AgentResult with status, patch, cost, steps, etc.
        """
        ...

AgentResult structure

class AgentResult:
    """Result from an agent run."""
    status: str              # "Submitted", "Failed", "Error"
    patch: str               # Generated code changes (unified diff)
    cost: float              # API cost in USD
    steps: int               # Number of agent steps
    log: str                 # Execution log
    error: str | None        # Error message if failed

Creating custom agents

Basic custom agent

from cooperbench.agents.registry import register
from dataclasses import dataclass

@dataclass
class AgentResult:
    status: str
    patch: str
    cost: float
    steps: int
    log: str
    error: str | None = None

@register("my_agent")
class MyAgentRunner:
    """Custom agent implementation."""

    def __init__(self, model_name: str = "gpt-4o", **kwargs):
        self.model_name = model_name
        self.config = kwargs

    def run(
        self,
        task: dict,
        image: str,
        timeout: int = 3600,
        **kwargs,
    ) -> AgentResult:
        """Run the agent on a task."""
        problem_statement = task["problem_statement"]
        feature_id = task["feature_id"]

        # Your agent implementation here
        # ...

        return AgentResult(
            status="Submitted",
            patch="diff --git a/file.py...",
            cost=0.50,
            steps=10,
            log="Agent execution log...",
            error=None,
        )

Use custom agent

from cooperbench import run

run(
    run_name="custom_agent_test",
    subset="lite",
    agent="my_agent",
    model_name="gpt-4o",
)

Register external agents

You can also register agents via environment variable:

export COOPERBENCH_EXTERNAL_AGENTS="mypackage.agents.custom_agent,otherpackage.agent"

Then use:

from cooperbench import run

run(
    run_name="external_agent",
    subset="lite",
    agent="custom_agent",  # Assumes @register("custom_agent") in mypackage.agents.custom_agent
)

Agent configuration

Pass agent-specific config

You can pass additional configuration to agents:

from cooperbench.agents.registry import get_runner

# Get agent with custom config
runner = get_runner(
    "mini_swe_agent",
    model_name="gpt-4o",
    temperature=0.7,
    max_tokens=4000,
)

Use config file

For complex configurations, use a config file:

from cooperbench import run

run(
    run_name="configured_agent",
    subset="lite",
    agent="swe_agent",
    agent_config="config/swe_agent_custom.yaml",
)

Agent task specification

The task parameter passed to agents contains:

{
    "problem_statement": "Implement feature X...",  # Feature description
    "feature_id": 1,                                # Which feature to implement
    "repo": "llama_index_task",                    # Repository name
    "task_id": 1,                                   # Task ID
    "redis_url": "redis://localhost:6379",         # For messaging (coop mode)
    "git_enabled": False,                           # Git collaboration enabled
    "messaging_enabled": True,                      # Messaging enabled
}

Working with agent results

Access agent logs

import json
from pathlib import Path
from cooperbench import discover_runs

runs = discover_runs(run_name="my_experiment")

for run in runs:
    log_dir = Path(run["log_dir"])

    # Read result.json
    with open(log_dir / "result.json") as f:
        result = json.load(f)

    if run["setting"] == "coop":
        # Cooperative mode: two agent results
        for agent_id, agent_result in result["results"].items():
            print(f"Agent {agent_id}:")
            print(f"  Status: {agent_result['status']}")
            print(f"  Cost: ${agent_result['cost']:.2f}")
            print(f"  Steps: {agent_result['steps']}")

            # Read agent log
            log_file = log_dir / f"{agent_id}.log"
            if log_file.exists():
                log_content = log_file.read_text()
                print(f"  Log preview: {log_content[:100]}...")
    else:
        # Solo mode: single agent result
        print(f"Status: {result['result']['status']}")
        print(f"Cost: ${result['result']['cost']:.2f}")

Extract patches

from pathlib import Path
from cooperbench import discover_runs

runs = discover_runs(run_name="my_experiment")

for run in runs:
    log_dir = Path(run["log_dir"])

    if run["setting"] == "coop":
        # Read both agent patches
        patch1 = (log_dir / "agent1.patch").read_text()
        patch2 = (log_dir / "agent2.patch").read_text()
        print(f"Agent 1 changed {len(patch1.splitlines())} lines")
        print(f"Agent 2 changed {len(patch2.splitlines())} lines")
    else:
        # Read solo patch
        patch = (log_dir / "solo.patch").read_text()
        print(f"Agent changed {len(patch.splitlines())} lines")

Advanced agent features

Cooperative mode features

When running in cooperative mode (setting="coop"), agents have access to:

Messaging

Agents can send messages to each other:

# In agent implementation
self.send_message(
    to_agent="agent2",
    message="I've implemented the database models",
)

Git collaboration

When git_enabled=True, agents can:

Push changes: git push origin feature-branch
Pull updates: git pull origin feature-branch
Merge branches: git merge other-branch
View history: git log

Environment access

Agents run in Docker containers with:

Full repository access
Python/Node.js/etc. runtime
Git (if enabled)
Redis client (if messaging enabled)

Best practices

Choose the right agent

mini_swe_agent: Fast, lightweight, good for most tasks
mini_swe_agent_v2: Enhanced tooling, better for complex tasks
swe_agent: Full-featured, best for maximum capability
openhands_sdk: Alternative framework with different strengths

Optimize costs

# Use cheaper models for simple tasks
run(
    run_name="cost_optimized",
    subset="lite",
    model_name="vertex_ai/gemini-3-flash-preview",  # Cheaper than GPT-4
)

Debug agent issues

# Run single task to see detailed output
run(
    run_name="debug_run",
    repo="llama_index_task",
    task_id=1,
    features=[1, 2],
    agent="mini_swe_agent",
)

# Check logs
import json
from pathlib import Path

log_dir = Path("logs/debug_run/coop/llama_index_task/1/f1_f2")
with open(log_dir / "result.json") as f:
    result = json.load(f)

print(result["results"]["agent1"]["error"])

run() - Execute tasks with agents
list_agents() - List available agents
get_runner() - Get agent instance

Core Functions

Advanced

Agent interface

Available agents

Using agents

Specify agent in run()

List available agents

Agent interface

AgentResult structure

Creating custom agents

Basic custom agent

Use custom agent

Register external agents

Agent configuration

Pass agent-specific config

Use config file

Agent task specification

Working with agent results

Access agent logs

Extract patches

Advanced agent features

Cooperative mode features

Messaging

Git collaboration

Environment access

Best practices

Choose the right agent

Optimize costs

Debug agent issues

Build docs developers (and LLMs) love

Core Functions

Advanced

Documentation Index

​Available agents

​Using agents

​Specify agent in run()

​List available agents

​Agent interface

​AgentResult structure

​Creating custom agents

​Basic custom agent

​Use custom agent

​Register external agents

​Agent configuration

​Pass agent-specific config

​Use config file

​Agent task specification

​Working with agent results

​Access agent logs

​Extract patches

​Advanced agent features

​Cooperative mode features

​Messaging

​Git collaboration

​Environment access

​Best practices

​Choose the right agent

​Optimize costs

​Debug agent issues

​Related functions

Build docs developers (and LLMs) love

Available agents

Using agents

Specify agent in run()

List available agents

Agent interface

AgentResult structure

Creating custom agents

Basic custom agent

Use custom agent

Register external agents

Agent configuration

Pass agent-specific config

Use config file

Agent task specification

Working with agent results

Access agent logs

Extract patches

Advanced agent features

Cooperative mode features

Messaging

Git collaboration

Environment access

Best practices

Choose the right agent

Optimize costs

Debug agent issues

Related functions