The Genesis domain evaluates agents that generate reward functions for reinforcement learning controllers. The agent is given a task description and must write a Python reward function that, when used to train an RL policy, produces the desired locomotion behavior in the Genesis physics simulator.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/facebookresearch/HyperAgents/llms.txt
Use this file to discover all available pages before exploring further.
What It Evaluates
Genesis tests reward function engineering ability. The agent’s output (a reward function) is not judged directly; instead, it is used to train an RL policy viarsl-rl, and the resulting policy’s behavior is evaluated in simulation. The primary metric is average_fitness — a normalized 0–1 score measuring how well the trained policy executes the task.
Evaluation Setup
- Total simulation time: 20 seconds per evaluation
- Episode duration: 4.0 seconds (200 steps at dt = 0.02 s)
- Parallel environments: 4096 simulated simultaneously
- Fitness score range: 0 (worst) to 1 (best)
- RL training: 101 policy update iterations before evaluation
- Early termination: an episode ends early if the robot falls (roll or pitch > 10°)
The Three Environments
- go2walking
- go2walkback
- go2hop
Go2WalkingCommand-v0 — The Unitree Go2 robot must learn to walk forward at a commanded speed.
- Task:
Go2WalkingCommand-v0/speed - Linear velocity range:
[0.2, 0.8]m/s in the x direction - Default episodes: 6
- Domain:
genesis_go2walking
Requirements
Install PyTorch
Check your CUDA version first:Install Genesis
Setup and Run
num_workers Constraint
Hydra Configuration
The config file is atdomains/genesis/config/config.yaml. Key sections:
Output Structure
Outputs are written to<output_dir>/<env_name>/<task_name>/ and include:
chat_history_*.md— agent conversation logrl_eval_<episode_idx>/— evaluation results (JSON log,eval_100.mp4video)rl_train_<episode_idx>/— training artifacts (model checkpointsmodel_0.pt,model_100.pt, TensorBoard events, config pickle)
Domain Properties
| Property | Value |
|---|---|
| Score key | average_fitness |
| Splits | train only |
| Eval subset | full dataset |
| Ensemble supported | No |
| Staged eval samples | 3 out of 6 (50%) |
num_workers | Always 1 |