BIRD-Interact is an interactive text-to-SQL benchmark where an agent must complete multi-turn database tasks — querying, inserting, updating, and deleting records — against a live Postgres database. Unlike static SQL benchmarks that grade a single generated query, BIRD-Interact simulates a real conversation: a user simulator sends requests, the agent issues SQL through the BIRD-Interact-ADK services, and correctness is judged on the final database state. auto-harness integrates this viaDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/neosigmaai/auto-harness/llms.txt
Use this file to discover all available pages before exploring further.
BirdInteractRunner, which spawns the three required ADK services and drives orchestrator.runner to completion.
Agent interface
The BIRD-Interact agent is a Google ADK agent served as a FastAPI service. auto-harness wrapsagent/agent.py in agent/helpers/bird_interact/bird_service.py and exposes it at system_agent_port (default 6100). The BIRD orchestrator routes user simulator messages to this service, and the service calls agent/agent.py’s build_agent() to produce responses.
The two interaction modes control the agent’s conversational strategy:
| Mode | Description |
|---|---|
a-interact | Autonomous tool-using SQL agent — acts on requests directly |
c-interact | Clarification-first conversational SQL agent — asks before acting |
a-interact.
BirdInteractRunner
BirdInteractRunner in benchmark.py manages the full lifecycle: resolving the ADK directory and Python interpreter, starting the three services, invoking orchestrator.runner, parsing results, and copying traces.
Constructor
Split file
The train/test split is stored atbird_data/task_split.json (the SPLIT_FILE class constant), generated by prepare.py during the baseline run using a 70/30 stratified shuffle with seed=42.
Datasets
| Dataset | Tasks | Path |
|---|---|---|
lite | 300 | bird_interact_adk/bird-interact-lite/bird_interact_data.jsonl |
full | 600 | bird_interact_adk/bird-interact-full/bird_interact_data.jsonl |
The 3-service architecture
EachBirdInteractRunner.run() call starts three FastAPI services via uvicorn, waits for each to pass a /health check, runs the orchestrator, then terminates all services:
| Service | Module | Default port | Description |
|---|---|---|---|
| System agent | agent.helpers.bird_interact.bird_service | 6100 | Serves agent/agent.py as a FastAPI endpoint |
| User simulator | user_simulator.server | 6101 | Drives the conversation from the BIRD-Interact-ADK |
| DB environment | db_environment.server | 6102 | Manages the Postgres session |
Trace management
After each train-split run, the runner copies per-task traces into the workspace:workspace/traces/baseline/ holds immutable first-run traces and is never overwritten. Only train-split traces are saved.
auto-provisioning with prepare.py
prepare.py handles everything automatically on first run:
Creates an isolated venv
Creates
.venv-adk inside bird_interact_adk/ with the ADK’s dependencies (google-adk, psycopg2, etc.) isolated from the main project.Advanced users can skip auto-provisioning by setting
bird_repo and bird_python_bin in experiment_config.yaml to point at an existing BIRD-Interact-ADK install.Ground truth access
The public BIRD-Interact dataset ships without gold SQL to prevent data leakage. You must request it before the baseline run will produce meaningful scores.Email for ground truth
Email
bird.bench25@gmail.com with subject [bird-interact-lite GT&Test Cases].Merge the ground truth
Run the
combine_public_with_gt.py script that prepare.py prints when it detects missing ground truth, passing the .jsonl file you receive.prepare.py detects missing ground truth, it prints the exact merge command to run — you do not need to locate the script manually.
Configuration
Uncomment and edit the BIRD-INTERACT block inexperiment_config.yaml:
ANTHROPIC_API_KEY(orOPENAI_API_KEY/GEMINI_API_KEYdepending on the configured model)
git-lfs (for the HuggingFace dataset).
Known caveats
Editing agent/agent.py
The starting template atagent/templates/bird_interact.py is a faithful copy of the stock BIRD-Interact-ADK system agent. The coding agent can improve:
AINTERACT_INSTRUCTION— the system prompt for autonomous (a-interact) modeCINTERACT_INSTRUCTION— the system prompt for conversational (c-interact) modebuild_agent()— how the model and ADK session are configured per mode