Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neosigmaai/auto-harness/llms.txt

Use this file to discover all available pages before exploring further.

auto-harness reads environment variables for secrets and runtime overrides that should not be checked into version control. The canonical list lives in .env.example at the repo root. Copy it to .env before your first run, then fill in the values you need:
cp .env.example .env
Most variables are optional and only apply to specific benchmarks. The sections below explain which variables are required for each benchmark and what they control.

LLM API keys

The correct key is determined automatically from the agent_model value in experiment_config.yaml. If the model name starts with "gemini", GEMINI_API_KEY is required. If it starts with "claude", ANTHROPIC_API_KEY is required. All other models fall back to OPENAI_API_KEY.
OPENAI_API_KEY
string
OpenAI API key. Required when agent_model is an OpenAI model (anything that does not start with "gemini" or "claude"). Also forwarded as LITELLM_API_KEY inside BIRD-Interact runs when LITELLM_API_KEY is not set separately.
ANTHROPIC_API_KEY
string
Anthropic API key. Required when agent_model starts with "claude" (e.g., "anthropic/claude-sonnet-4-20250514").
GEMINI_API_KEY
string
Google Gemini API key. Required when agent_model starts with "gemini".

Sandbox provider keys

These variables are only required for Terminal-Bench when env_provider is set to "e2b" or "daytona". The "docker" provider requires no API key.
E2B_API_KEY
string
E2B sandbox API key. Required when env_provider: "e2b" in experiment_config.yaml.
DAYTONA_API_KEY
string
Daytona API key. Required when env_provider: "daytona" in experiment_config.yaml.
Only one sandbox provider key is needed per experiment. Set E2B_API_KEY for E2B, DAYTONA_API_KEY for Daytona, or neither if you are using the local Docker provider.

Runtime control

These variables let you override experiment configuration at runtime without editing experiment_config.yaml. They are primarily used by the benchmark runner internally, but you can also set them manually for ad-hoc runs.
AGENT_MODEL
string
Override the agent model. TauBenchRunner and TerminalBenchRunner read this as the fallback when agent_model is not set in experiment_config.yaml. Defaults to "gpt-5.4".
AGENT_REASONING_EFFORT
string
Override the reasoning effort level. Set automatically from reasoning_effort in experiment_config.yaml before each run. Accepted values: "low", "medium", "high".
HARNESS_SAVE_TRACE
string
Controls whether TerminalBenchRunner copies agent traces to workspace/traces/. Set to "0" to disable trace saving. The runner sets this automatically to "0" for non-train splits (test and baseline all-tasks runs), preventing the coding agent from reading test-split traces.

tau-bench data directory

TAU2_DATA_DIR
string
Path to the directory where tau-bench data is stored. Defaults to ./tau2_data (relative to the auto-harness repo root). prepare.py clones the tau2-bench repository into this directory on first run if it is not already present. Override this to share the data directory across multiple experiments.

BIRD-Interact overrides

These variables mirror the advanced override keys in experiment_config.yaml. Set them as environment variables if you prefer not to hard-code paths in the config file.
BIRD_REPO
string
Absolute path to an existing BIRD-Interact repo root or BIRD-Interact-ADK directory. Takes effect when bird_repo is not set in experiment_config.yaml.
BIRD_PYTHON_BIN
string
Absolute path to a Python interpreter with BIRD-Interact-ADK dependencies installed. Takes effect when bird_python_bin is not set in experiment_config.yaml.

BIRD-Interact service ports

These variables are set automatically by BirdInteractRunner from the values in experiment_config.yaml. You can also set them directly in your shell to override defaults without editing the config.
SYSTEM_AGENT_PORT
string
Port for the BIRD-Interact system agent service. Defaults to 6100.
USER_SIM_PORT
string
Port for the user simulator service. Defaults to 6101.
DB_ENV_PORT
string
Port for the database environment service. Defaults to 6102.
DATASET
string
BIRD-Interact dataset size passed to the orchestrator. Set automatically from dataset in experiment_config.yaml. Accepted values: "lite" (300 tasks) or "full" (600 tasks).
PATIENCE
string
Maximum clarification turns per task in c-interact mode. Set automatically from patience in experiment_config.yaml. Defaults to 3.

Postgres connection

These variables configure the Postgres connection used by BIRD-Interact. prepare.py provisions a Docker container with these defaults on first run. Override them only when pointing at an existing Postgres instance.
PG_HOST
string
Postgres host. Defaults to 127.0.0.1.
PG_PORT
string
Postgres port. Defaults to 5432.
PG_USER
string
Postgres username. Defaults to root.
PG_PASSWORD
string
Postgres password. Defaults to 123123.

Required variables by benchmark

VariableRequiredNotes
OPENAI_API_KEY or ANTHROPIC_API_KEY or GEMINI_API_KEYYesMatch to your agent_model
E2B_API_KEYWhen env_provider: "e2b"
DAYTONA_API_KEYWhen env_provider: "daytona"
VariableRequiredNotes
OPENAI_API_KEY or ANTHROPIC_API_KEY or GEMINI_API_KEYYesMatch to your agent_model
TAU2_DATA_DIRNoDefaults to ./tau2_data; data is auto-cloned
VariableRequiredNotes
OPENAI_API_KEY or ANTHROPIC_API_KEY or GEMINI_API_KEYYesMatch to your agent_model
BIRD_REPONoAuto-provisioned into ./bird_interact_adk/
BIRD_PYTHON_BINNoAuto-resolved from venv inside ADK
PG_HOST, PG_PORT, PG_USER, PG_PASSWORDNoAuto-provisioned Docker container

Example .env file

# LLM API keys — set whichever your agent_model needs
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...

# Terminal-Bench sandbox provider (set one)
E2B_API_KEY=e2b_...
DAYTONA_API_KEY=...

Build docs developers (and LLMs) love