TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/math-inc/OpenGauss/llms.txt
Use this file to discover all available pages before exploring further.
batch_runner.py module lets you run many agent tasks in parallel by spawning a pool of AIAgent instances across multiple worker processes. It is purpose-built for generating training trajectories from a large prompt dataset — each prompt gets its own isolated agent, its own VM sandbox, and its own tool-usage statistics. Results are checkpointed after every batch so interrupted runs can be safely resumed.
When to Use Batch Runner vs. Interactive CLI
| Scenario | Use |
|---|---|
| Exploratory, interactive work | gauss (interactive CLI) |
| Single-prompt automation or scripting | AIAgent.chat() / run_conversation() directly |
| Processing hundreds or thousands of prompts | batch_runner.py |
| Generating fine-tuning trajectories | batch_runner.py |
| Parallel evaluation benchmarks | batch_runner.py |
Installation
The batch runner is included with the basegauss-agent package. No additional extras are required for basic use.
gauss-agent entry point (from pyproject.toml) maps to run_agent:main. The batch runner is invoked directly as a script:
Input Format
The batch runner reads a JSONL file — one JSON object per line. Each line must contain at minimum a"prompt" field:
| Field | Type | Description |
|---|---|---|
prompt | str | Required. The task description sent to the agent. |
image | str | Container image override for this task’s sandbox (Docker, Modal, Singularity, or Daytona). |
docker_image | str | Alias for image. |
cwd | str | Working directory override for the task’s terminal environment. |
"prompt" fields or invalid JSON are skipped with a warning; the run continues.
Output Format
All output is written todata/<run_name>/:
trajectories.jsonl — per-entry schema
Each line in trajectories.jsonl is a JSON object:
tool_stats and tool_error_counts always include all possible tools with zero defaults, ensuring a consistent schema for loading into HuggingFace Datasets or Apache Arrow/Parquet without schema mismatch errors.
Entries are automatically discarded if the agent produced zero reasoning across all assistant turns (no <REASONING_SCRATCHPAD> and no native thinking tokens). These samples are logged and counted in the summary but not written to trajectories.jsonl.
statistics.json
CLI Reference
Required Arguments
Path to the JSONL input file. Each line must have a
"prompt" key.Number of prompts processed per batch. Each batch runs its prompts sequentially inside a single worker process. Multiple batches execute in parallel across
--num_workers processes.Identifier for this run. Determines the output directory (
data/<run_name>/) and the checkpoint file name. Reuse the same name with --resume to continue an interrupted run.Model and Provider Arguments
Model identifier in OpenRouter format passed to every
AIAgent instance.API key for the model provider. Falls back to
OPENROUTER_API_KEY (or provider-specific env vars) when not set.Base URL for the LLM API.
Maximum tool-calling iterations per prompt (maps to
AIAgent.max_iterations). Keep this low (10–20) for batch generation to control cost; the interactive CLI default is 90.Maximum tokens per model response. Uses the model’s native default when not set.
OpenRouter reasoning effort level. Accepted values:
"xhigh", "high", "medium", "low", "minimal", "none". Defaults to "medium" when not specified.Completely disable reasoning/thinking tokens. Equivalent to
--reasoning_effort=none. Takes precedence over --reasoning_effort.Concurrency Arguments
Number of parallel worker processes (using
multiprocessing.Pool). Each worker handles one batch at a time. Set based on available CPU cores and API rate limits. Higher values increase throughput but also API concurrency.Toolset Distribution Arguments
Named toolset distribution used to sample which toolsets each prompt receives. Each prompt gets an independently sampled subset. List available distributions with
--list_distributions.Print all available toolset distributions and their descriptions, then exit.
Resume and Checkpointing Arguments
Resume from a previous interrupted run. The runner scans all
batch_*.jsonl files for completed prompts by matching prompt text content (not just indices), then rebuilds the batch list with only the remaining prompts.Process only the first N samples from the dataset. Useful for quick test runs before committing to a full dataset.
Logging and Output Arguments
Enable verbose logging in worker processes. Prints full tracebacks on errors and shows per-prompt toolset selection.
Number of characters to show in log previews for tool arguments and responses.
A system prompt injected into each agent during execution but not saved to output trajectories. Use this for task-framing instructions that should not appear in training data.
OpenRouter Provider Routing Arguments
Comma-separated list of OpenRouter providers to allow (e.g.
"anthropic,google").Comma-separated list of OpenRouter providers to exclude (e.g.
"together,deepinfra").Comma-separated provider preference order (e.g.
"anthropic,openai,google").Sort providers by
"price", "throughput", or "latency".Prefill Arguments
Path to a JSON file containing an array of prefill messages (
[{"role": "user", "content": "..."}, ...]). These messages are prepended to every conversation for few-shot priming.Usage Examples
Resume an interrupted run
data/my_run/batch_*.jsonl for completed prompts using content-based matching and processes only the remainder.Programmatic Use
You can also driveBatchRunner directly from Python for tighter integration with your pipeline:
_process_single_prompt(), which instantiates a fresh AIAgent with skip_context_files=True and skip_memory=True hardcoded (always set in batch mode to prevent user-specific files from appearing in trajectories). These are internal defaults and are not exposed as BatchRunner constructor parameters.
Trajectory Format from Code
After a run, load the combined trajectories in Python:Concurrency Model
The batch runner uses Python’smultiprocessing.Pool — not threads — for parallelism. Each worker is a separate OS process with its own memory space.
batch_size and num_workers so that total concurrent API calls stay within your rate limit.
_last_resolved_tool_names is a process-global in model_tools.py. When subagent delegation (delegate_tool.py) is used inside a batch worker, spawned subagents may overwrite this global. Subsequent execute_code calls in the same worker process may then fail with missing tool import errors. Avoid toolsets that trigger subagent delegation in batch runs, or set --max_turns low enough that delegation is unlikely to occur.Checkpointing and Fault Tolerance
The checkpoint file atdata/<run_name>/checkpoint.json is updated incrementally — after each batch completes, not only at the end of the full run. This means:
- A crash mid-run loses at most one batch worth of work.
- On
--resume, the runner performs content-based matching: it scans allbatch_*.jsonlfiles and extracts the human prompt text from completedconversations. This is more robust than index-based matching and correctly handles dataset re-ordering or index shifts between runs.
run_conversation()) are not written to the batch output file, so they remain eligible for retry on resume.
Trajectory Saving and Quality Filters
The batch runner handles all trajectory serialization itself viaagent._convert_to_trajectory_format(). It always passes save_trajectories=False to each AIAgent instance to avoid double-writing.
Quality filters applied before writing to batch_*.jsonl:
- No-reasoning filter — Trajectories where zero assistant turns contain reasoning (no
<REASONING_SCRATCHPAD>tag and no native thinking tokens) are discarded. The count appears in the run summary under “Samples discarded (zero reasoning)”. - Invalid tool name filter — At combine time, entries containing tool names not in the master
TOOL_TO_TOOLSET_MAPare filtered out. These result from model hallucinations and would break downstream schema validation.