Documentation Index
Fetch the complete documentation index at: https://mintlify.com/cooperbench/CooperBench/llms.txt
Use this file to discover all available pages before exploring further.
CooperBench supports multiple agent frameworks through a unified interface. You can use built-in agents or register custom ones.
Available agents
CooperBench includes the following built-in agents:
- mini_swe_agent (default) - Lightweight SWE-agent implementation
- mini_swe_agent_v2 - Enhanced version with improved tooling
- swe_agent - Full SWE-agent framework
- openhands_sdk - OpenHands agent SDK
Using agents
Specify agent in run()
from cooperbench import run
# Use mini_swe_agent (default)
run(
run_name="default_agent",
subset="lite",
agent="mini_swe_agent",
)
# Use SWE-agent
run(
run_name="swe_agent_test",
subset="lite",
agent="swe_agent",
model_name="gpt-4o",
)
# Use OpenHands
run(
run_name="openhands_test",
subset="lite",
agent="openhands_sdk",
)
List available agents
from cooperbench.agents.registry import list_agents
agents = list_agents()
print("Available agents:")
for agent in agents:
print(f" - {agent}")
Available agents:
- mini_swe_agent
- mini_swe_agent_v2
- openhands
- swe_agent
Agent interface
All agents implement the AgentRunner protocol:
class AgentRunner(Protocol):
"""Protocol for agent framework adapters."""
def run(
self,
task: dict,
image: str,
timeout: int = 3600,
**kwargs,
) -> AgentResult:
"""Run the agent on a task.
Args:
task: Task specification with problem_statement, feature_id, etc.
image: Docker image for the environment
timeout: Maximum execution time in seconds
**kwargs: Agent-specific configuration
Returns:
AgentResult with status, patch, cost, steps, etc.
"""
...
AgentResult structure
class AgentResult:
"""Result from an agent run."""
status: str # "Submitted", "Failed", "Error"
patch: str # Generated code changes (unified diff)
cost: float # API cost in USD
steps: int # Number of agent steps
log: str # Execution log
error: str | None # Error message if failed
Creating custom agents
Basic custom agent
from cooperbench.agents.registry import register
from dataclasses import dataclass
@dataclass
class AgentResult:
status: str
patch: str
cost: float
steps: int
log: str
error: str | None = None
@register("my_agent")
class MyAgentRunner:
"""Custom agent implementation."""
def __init__(self, model_name: str = "gpt-4o", **kwargs):
self.model_name = model_name
self.config = kwargs
def run(
self,
task: dict,
image: str,
timeout: int = 3600,
**kwargs,
) -> AgentResult:
"""Run the agent on a task."""
problem_statement = task["problem_statement"]
feature_id = task["feature_id"]
# Your agent implementation here
# ...
return AgentResult(
status="Submitted",
patch="diff --git a/file.py...",
cost=0.50,
steps=10,
log="Agent execution log...",
error=None,
)
Use custom agent
from cooperbench import run
run(
run_name="custom_agent_test",
subset="lite",
agent="my_agent",
model_name="gpt-4o",
)
Register external agents
You can also register agents via environment variable:
export COOPERBENCH_EXTERNAL_AGENTS="mypackage.agents.custom_agent,otherpackage.agent"
Then use:
from cooperbench import run
run(
run_name="external_agent",
subset="lite",
agent="custom_agent", # Assumes @register("custom_agent") in mypackage.agents.custom_agent
)
Agent configuration
Pass agent-specific config
You can pass additional configuration to agents:
from cooperbench.agents.registry import get_runner
# Get agent with custom config
runner = get_runner(
"mini_swe_agent",
model_name="gpt-4o",
temperature=0.7,
max_tokens=4000,
)
Use config file
For complex configurations, use a config file:
from cooperbench import run
run(
run_name="configured_agent",
subset="lite",
agent="swe_agent",
agent_config="config/swe_agent_custom.yaml",
)
Agent task specification
The task parameter passed to agents contains:
{
"problem_statement": "Implement feature X...", # Feature description
"feature_id": 1, # Which feature to implement
"repo": "llama_index_task", # Repository name
"task_id": 1, # Task ID
"redis_url": "redis://localhost:6379", # For messaging (coop mode)
"git_enabled": False, # Git collaboration enabled
"messaging_enabled": True, # Messaging enabled
}
Working with agent results
Access agent logs
import json
from pathlib import Path
from cooperbench import discover_runs
runs = discover_runs(run_name="my_experiment")
for run in runs:
log_dir = Path(run["log_dir"])
# Read result.json
with open(log_dir / "result.json") as f:
result = json.load(f)
if run["setting"] == "coop":
# Cooperative mode: two agent results
for agent_id, agent_result in result["results"].items():
print(f"Agent {agent_id}:")
print(f" Status: {agent_result['status']}")
print(f" Cost: ${agent_result['cost']:.2f}")
print(f" Steps: {agent_result['steps']}")
# Read agent log
log_file = log_dir / f"{agent_id}.log"
if log_file.exists():
log_content = log_file.read_text()
print(f" Log preview: {log_content[:100]}...")
else:
# Solo mode: single agent result
print(f"Status: {result['result']['status']}")
print(f"Cost: ${result['result']['cost']:.2f}")
from pathlib import Path
from cooperbench import discover_runs
runs = discover_runs(run_name="my_experiment")
for run in runs:
log_dir = Path(run["log_dir"])
if run["setting"] == "coop":
# Read both agent patches
patch1 = (log_dir / "agent1.patch").read_text()
patch2 = (log_dir / "agent2.patch").read_text()
print(f"Agent 1 changed {len(patch1.splitlines())} lines")
print(f"Agent 2 changed {len(patch2.splitlines())} lines")
else:
# Read solo patch
patch = (log_dir / "solo.patch").read_text()
print(f"Agent changed {len(patch.splitlines())} lines")
Advanced agent features
Cooperative mode features
When running in cooperative mode (setting="coop"), agents have access to:
Messaging
Agents can send messages to each other:
# In agent implementation
self.send_message(
to_agent="agent2",
message="I've implemented the database models",
)
Git collaboration
When git_enabled=True, agents can:
- Push changes:
git push origin feature-branch
- Pull updates:
git pull origin feature-branch
- Merge branches:
git merge other-branch
- View history:
git log
Environment access
Agents run in Docker containers with:
- Full repository access
- Python/Node.js/etc. runtime
- Git (if enabled)
- Redis client (if messaging enabled)
Best practices
Choose the right agent
- mini_swe_agent: Fast, lightweight, good for most tasks
- mini_swe_agent_v2: Enhanced tooling, better for complex tasks
- swe_agent: Full-featured, best for maximum capability
- openhands_sdk: Alternative framework with different strengths
Optimize costs
# Use cheaper models for simple tasks
run(
run_name="cost_optimized",
subset="lite",
model_name="vertex_ai/gemini-3-flash-preview", # Cheaper than GPT-4
)
Debug agent issues
# Run single task to see detailed output
run(
run_name="debug_run",
repo="llama_index_task",
task_id=1,
features=[1, 2],
agent="mini_swe_agent",
)
# Check logs
import json
from pathlib import Path
log_dir = Path("logs/debug_run/coop/llama_index_task/1/f1_f2")
with open(log_dir / "result.json") as f:
result = json.load(f)
print(result["results"]["agent1"]["error"])