Environment Setup

API Keys

HyperAgents uses LiteLLM to route requests to multiple LLM providers. You need API keys for whichever providers you plan to use. Create a .env file in the repository root:

.env

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...

All three keys are optional — only the key for the provider you actually use needs to be set. However, the default model is openai/gpt-4o, so OPENAI_API_KEY is required unless you override --model.

System Requirements

HyperAgents requires Python 3.12 and several system-level packages. These instructions are for Fedora/RHEL-based systems using dnf.

# Python 3.12 development headers
sudo dnf install -y python3.12-devel

# Build tools and domain-specific dependencies
sudo dnf install -y graphviz graphviz-devel cmake ninja-build bzip2-devel zlib-devel ncurses-devel libffi-devel

HyperAgents executes model-generated code inside Docker containers. Docker must be installed and the current user must have permission to run Docker commands before proceeding.

Python Environment

# Create and activate a virtual environment
python3.12 -m venv venv_nat
source venv_nat/bin/activate

# Install runtime and dev dependencies
pip install -r requirements.txt
pip install -r requirements_dev.txt

# Build the Docker image used for sandboxed evaluation
docker build --network=host -t hyperagents .

# Initialize the starting agent checkpoints
bash ./setup_initial.sh

Python Dependencies

`requirements.txt` — Runtime

Package	Version	Purpose
`requests`	2.32.4	HTTP client for API calls
`dotenv`	0.9.9	Loads `.env` file into environment variables
`tqdm`	4.67.1	Progress bars for evaluation loops
`backoff`	2.2.1	Exponential retry on LLM API failures
`matplotlib`	3.10.3	Plotting generation progress
`docker`	7.1.0	Python SDK for managing Docker containers
`datasets`	3.6.0	HuggingFace datasets (used by several domains)
`GitPython`	3.1.44	Git operations for patch management
`litellm`	1.74.9	Unified LLM API across OpenAI, Anthropic, Gemini
`pandas`	2.3.2	Result aggregation and analysis
`sympy`	1.14.0	Symbolic math (used by `imo_grading` / `imo_proof`)
`hydra-core`	1.3.2	Config management for Balrog domains
`gym` / `gymnasium`	0.23.0 / 1.2.0	RL environment interfaces for Balrog
`rsl-rl-lib`	2.2.4	Reinforcement learning training for Genesis domain
`tensorboard`	2.20.0	Training metrics logging for Genesis
`Genesis`	git	Physics simulation engine for robotics domain
`Minigrid`	git	Grid-world environments for `balrog_babyai`
`minihack`	git	NetHack-based environments for `balrog_minihack`
`baba-is-ai`	git	Baba Is You environment for `balrog_babaisai`

`requirements_dev.txt` — Analysis & Visualization

Package	Version	Purpose
`networkx`	3.5	Archive graph construction and traversal
`pygraphviz`	1.14	Rendering archive lineage graphs (requires `graphviz` system package)
`plotly`	6.1.2	Interactive HTML plots
`scikit-learn`	1.7.0	Score-proportional parent selection utilities

requirements_dev.txt packages are only needed if you intend to run the analysis scripts in analysis/. They are not required for running generate_loop.py.

Supported LLM Models

Models are defined as constants in agent/llm.py and passed to LiteLLM. The string format is provider/model-id.

Constant	Model Identifier	Provider
`CLAUDE_MODEL`	`anthropic/claude-sonnet-4-5-20250929`	Anthropic
`CLAUDE_HAIKU_MODEL`	`anthropic/claude-3-haiku-20240307`	Anthropic (4096 token limit)
`CLAUDE_35NEW_MODEL`	`anthropic/claude-3-5-sonnet-20241022`	Anthropic
`OPENAI_MODEL`	`openai/gpt-4o`	OpenAI (default)
`OPENAI_MINI_MODEL`	`openai/gpt-4o-mini`	OpenAI
`OPENAI_O3_MODEL`	`openai/o3`	OpenAI
`OPENAI_O3MINI_MODEL`	`openai/o3-mini`	OpenAI
`OPENAI_O4MINI_MODEL`	`openai/o4-mini`	OpenAI
`OPENAI_GPT52_MODEL`	`openai/gpt-5.2`	OpenAI
`OPENAI_GPT5_MODEL`	`openai/gpt-5`	OpenAI (no temperature param)
`OPENAI_GPT5MINI_MODEL`	`openai/gpt-5-mini`	OpenAI (no temperature param)
`GEMINI_3_MODEL`	`gemini/gemini-3-pro-preview`	Google
`GEMINI_MODEL`	`gemini/gemini-2.5-pro`	Google
`GEMINI_FLASH_MODEL`	`gemini/gemini-2.5-flash`	Google

gpt-5 and gpt-5-mini do not accept a temperature parameter — LiteLLM will skip it automatically. All GPT-5 family models use max_completion_tokens instead of max_tokens. Claude Haiku is capped at 4096 output tokens regardless of the global MAX_TOKENS setting.

Setting the Model

The meta-agent model is configured via the --model argument in run_meta_agent.py. Pass the full LiteLLM model identifier:

python run_meta_agent.py --model anthropic/claude-3-5-sonnet-20241022 ...

When running via generate_loop.py, the model argument is forwarded automatically. For the polyglot domain specifically, the loop hardcodes claude-3-5-sonnet-20241022 for fair comparison with the DGM baseline. The global token limit is 16,384 (MAX_TOKENS in agent/llm.py). Failed API calls are retried with exponential backoff for up to 600 seconds, with a maximum interval of 60 seconds between retries.

Get Started

Core Concepts

Domains

Configuration & Running

Analysis & Outputs

API Keys

System Requirements

Python Environment

Python Dependencies

`requirements.txt` — Runtime

`requirements_dev.txt` — Analysis & Visualization

Supported LLM Models

Setting the Model

Build docs developers (and LLMs) love

Get Started

Core Concepts

Domains

Configuration & Running

Analysis & Outputs

Documentation Index

​API Keys

​System Requirements

​Python Environment

​Python Dependencies

​requirements.txt — Runtime

​requirements_dev.txt — Analysis & Visualization

​Supported LLM Models

​Setting the Model

Build docs developers (and LLMs) love

API Keys

System Requirements

Python Environment

Python Dependencies

`requirements.txt` — Runtime

`requirements_dev.txt` — Analysis & Visualization

Supported LLM Models

Setting the Model