Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/cooperbench/CooperBench/llms.txt

Use this file to discover all available pages before exploring further.

CooperBench supports three execution backends for running agent tasks and evaluations in isolated environments.

Overview

Backends determine where agent tasks and test evaluations run:
  • Modal - Cloud execution (default, easiest)
  • Docker - Local containers (no cloud account needed)
  • GCP - Google Cloud Platform VMs (scalable, custom infrastructure)

Selecting a backend

Use the --backend flag with run or eval commands:
cooperbench run --backend modal
cooperbench run --backend docker
cooperbench run --backend gcp
cooperbench eval -n my-experiment --backend modal
cooperbench eval -n my-experiment --backend docker  
cooperbench eval -n my-experiment --backend gcp

Overview

Modal is a serverless cloud platform that runs tasks in ephemeral containers. Pros:
  • No setup required
  • Scales automatically
  • Fast cold starts
  • Pay only for usage
Cons:
  • Requires Modal account
  • Internet connection required
  • Costs money (free tier available)

Setup

  1. Create Modal account: Visit modal.com and sign up.
  2. Install Modal:
    pip install modal
    
  3. Authenticate:
    modal token new
    
    This opens your browser to authenticate.
  4. Run tasks:
    cooperbench run --backend modal
    

Configuration

No additional configuration needed. Modal authenticates via ~/.modal.toml.

Example usage

# Run benchmark on Modal
cooperbench run -s lite --backend modal

# Evaluate on Modal
cooperbench eval -n my-experiment --backend modal

# High concurrency (Modal scales automatically)
cooperbench run -c 100 --backend modal

Docker (local)

Overview

Docker backend runs tasks in local containers using your machine’s resources. Pros:
  • No cloud account needed
  • Works offline
  • No usage costs
  • Full control
Cons:
  • Limited by local resources
  • Must have Docker installed
  • Slower than cloud for large workloads

Setup

  1. Install Docker:
    brew install --cask docker
    
  2. Start Docker daemon: Make sure Docker Desktop is running, or:
    sudo systemctl start docker
    
  3. Verify installation:
    docker --version
    docker ps
    
  4. Run tasks:
    cooperbench run --backend docker
    

Configuration

Optional environment variable:
export MSWEA_DOCKER_EXECUTABLE=docker  # Default
For Podman or other Docker-compatible tools:
export MSWEA_DOCKER_EXECUTABLE=podman

Example usage

# Run benchmark locally
cooperbench run -s lite --backend docker

# Evaluate locally
cooperbench eval -n my-experiment --backend docker

# Lower concurrency for local resources
cooperbench run -c 5 --backend docker

Performance tips

  • Limit concurrency: Use -c 2 or -c 5 to avoid overwhelming your machine
  • Increase Docker resources: In Docker Desktop, allocate more CPUs/memory
  • Clean up containers: Run docker system prune periodically

GCP (Google Cloud Platform)

Overview

GCP backend runs tasks on Google Cloud VMs using Cloud Batch. Pros:
  • Highly scalable
  • Custom machine types
  • Integrate with GCP infrastructure
  • More control than Modal
Cons:
  • Requires GCP account and billing
  • More complex setup
  • Need to manage quotas and resources

Setup

  1. Run configuration wizard:
    cooperbench config gcp
    
    This interactive wizard:
    • Checks for gcloud CLI
    • Authenticates with GCP
    • Configures project, region, zone
    • Validates API access
    See cooperbench config for details.
  2. Enable required APIs: The following APIs must be enabled in your GCP project:
  3. Set up billing: Ensure your project has billing enabled.
  4. Run tasks:
    cooperbench run --backend gcp
    

Configuration

Configuration is stored in ~/.config/cooperbench/config.json:
{
  "gcp_project_id": "my-project-123",
  "gcp_region": "us-central1",
  "gcp_zone": "us-central1-a",
  "gcp_bucket": "cooperbench-eval-my-project-123"
}
You can also override with environment variables:
export GOOGLE_CLOUD_PROJECT=my-project-123

Example usage

# Run benchmark on GCP
cooperbench run -s lite --backend gcp

# Evaluate on GCP Batch
cooperbench eval -n my-experiment --backend gcp

# High concurrency (GCP scales well)
cooperbench run -c 100 --backend gcp

Performance tips

  • Choose optimal region: Use regions close to your location or data
  • Check quotas: GCP has default quotas; request increases if needed
  • Monitor costs: Use GCP billing console to track spending

Choosing a backend

FactorModalDockerGCP
Setup complexityEasyEasyMedium
CostPay-per-useFreePay-per-use
ScalabilityHighLowHigh
Internet requiredYesNoYes
Requires accountYesNoYes
Best forQuick experimentsLocal dev/testingProduction workloads

Recommendations

For quick experiments: Use Modal. Minimal setup, fast, scales automatically.
cooperbench run --backend modal
For local development: Use Docker. No cloud needed, works offline.
cooperbench run -c 5 --backend docker
For production or large-scale experiments: Use GCP. More control, integrate with existing infrastructure.
cooperbench run -c 100 --backend gcp

Backend-specific features

  • Automatic retries on failure
  • Distributed tracing in Modal dashboard
  • GPU support (if configured)

Docker

  • Use custom Docker images
  • Mount local volumes for debugging
  • Offline operation

GCP

  • Custom machine types
  • Persistent disk support
  • VPC networking
  • Integrate with Cloud Storage, BigQuery, etc.

Troubleshooting

“Not authenticated”:
modal token new
“Rate limited”: Reduce concurrency with -c flag.

Docker issues

“Cannot connect to Docker daemon”:
sudo systemctl start docker  # Linux
# Or start Docker Desktop manually
“Out of disk space”:
docker system prune -a

GCP issues

“API not enabled”: Enable required APIs in GCP Console. “Insufficient quota”: Request quota increase in GCP Console. “Authentication failed”:
gcloud auth application-default login

Build docs developers (and LLMs) love