Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neosigmaai/auto-harness/llms.txt
Use this file to discover all available pages before exploring further.
gating.py implements the multi-step quality gate that must pass before each agent iteration is recorded as a success. It enforces a file-edit guard, re-runs an eval suite on the training split, runs the full benchmark on the test split, and promotes newly-passing tasks into the suite. The gate is typically invoked automatically by the coding agent, but can also be run directly from the command line.
Gate pipeline overview
Whenrun_gate is called, it executes up to four steps in order:
| Step | Name | What it checks |
|---|---|---|
| 0 | File guard | No tracked files outside ALLOWED_AGENT_FILES were modified |
| 1 | Eval suite | Re-runs workspace/suite.json tasks; pass rate ≥ threshold |
| 2 | Full benchmark | Runs the test split; val_score ≥ best value in results.tsv |
| 3 | Suite promotion | Newly-passing train tasks are added to workspace/suite.json |
Constants
Everything under
workspace/ is gitignored and therefore invisible to git. The file guard only inspects files that git can see, so edits to workspace/learnings.md and similar files are always permitted.run_gate
Runner used for the eval suite (Step 1) and suite promotion (Step 3). Should be configured for the training split.
Runner used for the full benchmark (Step 2). Should be configured for the test split.
0 on success (all steps passed), 1 on any failure.
Example
file_guard_violations
ALLOWED_AGENT_FILES.
Always inspects:
git diff-index --name-only HEAD— files in the working tree that differ from HEAD.git ls-files --others --exclude-standard— untracked files not covered by.gitignore.
check_last_commit=True, also inspects the diff of HEAD vs HEAD~1. This is used by record.py to catch agents that commit forbidden files before invoking record.
When
True, additionally checks whether the most recent commit (HEAD vs HEAD~1) touched any files outside the allowlist. Silently skipped when there is no parent commit.[] if there are no violations, or if git is unavailable (a one-time warning is printed to stderr in that case).
file_guard_enabled
True by default. The file guard is disabled only by explicit opt-out in experiment_config.yaml.
To disable the guard, add this to experiment_config.yaml:
false, no, off, 0, "" (case-insensitive). Any other value — including a missing key, null, or unrecognized strings — leaves the guard on. This conservative default means a typo will not silently disable a safety check.
Returns: True if the file guard is active, False if explicitly disabled.
load_suite
workspace/suite.json.
Returns: A dict with the following structure. Returns a default empty suite if the file does not exist.
Task IDs currently in the regression suite.
Minimum pass rate required to pass Step 1. Default
0.8 (80%).Per-task rewards from the most recent Step 1 run.
save_suite
workspace/suite.json.
The suite dict, as returned by
load_suite (and typically modified in-place by run_gate).best_val_score
val_score recorded in workspace/results.tsv.
Returns: The maximum val_score as a float, or None if results.tsv does not exist or contains no data rows.
load_config
experiment_config.yaml from the current working directory.
Returns: A dict of the parsed YAML contents, or {} if the file does not exist.
CLI usage
Runninggating.py directly reads experiment_config.yaml, constructs the appropriate train and gate runners for the configured benchmark, and runs all gate steps:
run_gate: 0 for all steps passed, 1 for any failure.