SWE-bench is a benchmark of real-world GitHub issues drawn from popular Python repositories. Each task gives the agent a repository snapshot and a problem statement; the agent must produce a patch that fixes the issue. mini-swe-agent ships two scripts for running on SWE-bench:Documentation Index
Fetch the complete documentation index at: https://mintlify.com/swe-agent/mini-swe-agent/llms.txt
Use this file to discover all available pages before exploring further.
mini-extra swebench for large-scale parallel evaluation and mini-extra swebench-single for interactive debugging on a single instance.
Running the benchmark
- Batch mode
- Single instance (debugging)
preds.json file as each instance completes.| Flag | Description |
|---|---|
-o, --output | Output directory for trajectories and preds.json |
-m, --model | Model to use (e.g., anthropic/claude-sonnet-4-5-20250929) |
-c, --config | Path to a config file, filename, or key=value pair. Defaults to the built-in swebench.yaml. If you set this flag the default is not loaded automatically — include it explicitly: -c swebench.yaml -c model.model_kwargs.temperature=0.5 |
-w, --workers | Number of parallel worker threads (default: 1) |
| Flag | Description |
|---|---|
--subset | SWE-bench subset: lite, verified, full, multimodal, multilingual, smith, rebench, or a path to a local dataset (default: lite) |
--split | Dataset split, e.g. dev or test (default: dev) |
--slice | Slice of instances to run, e.g. 0:5 for the first five |
--filter | Filter instance IDs by regex |
--shuffle | Shuffle the instance order before running (default: false) |
--redo-existing | Re-run instances that already have entries in preds.json (default: false) |
| Flag | Description |
|---|---|
--environment-class | Environment backend to use. Recommended values: docker or singularity |
Evaluating results
After a batch run completes, the output directory contains apreds.json file. You can evaluate it using SWE-bench’s free cloud service or a local installation.
- Cloud-based (sb-cli)
- Local evaluation
Install sb-cli and get a token
FAQ
Can I set global cost limits?
Can I set global cost limits?
MSWEA_GLOBAL_CALL_LIMIT and MSWEA_GLOBAL_COST_LIMIT environment variables, or set them in the global config file. See configuration for details.What happens to uncompleted tasks when I abort with KeyboardInterrupt?
What happens to uncompleted tasks when I abort with KeyboardInterrupt?
preds.json. However, check preds.json for entries with KeyboardInterrupt as the model patch — these were saved in an aborted state and should be removed or rerun with --redo-existing.Certain tasks are stuck even though I deleted their trajectories
Certain tasks are stuck even though I deleted their trajectories
preds.json, not by the presence of trajectory files. Remove the relevant entries from preds.json directly, then rerun.How can I run on a custom or different dataset?
How can I run on a custom or different dataset?
datasets.load_dataset(path, split=split) works. Pass the path with --subset /path/to/your/dataset.Some instances are stuck at 'initializing task' for a long time
Some instances are stuck at 'initializing task' for a long time
docker pull timeouts, increase the timeout via environment.pull_timeout in your config (default is 120 seconds).I'm having Docker issues
I'm having Docker issues
docker ps, then test access with:Docker isn't available on my HPC cluster (Singularity/Apptainer)
Docker isn't available on my HPC cluster (Singularity/Apptainer)
--environment-class singularity on the command line, or set it in your config file:Can I run a startup command in the environment?
Can I run a startup command in the environment?
run.env_startup_command in your config. The command is rendered with Jinja2 using the instance variables as template context:bubblewrap that don’t pre-install dependencies.