RLlib is a scalable reinforcement learning library built on Ray. Foundation environments integrate with RLlib through theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/salesforce/ai-economist/llms.txt
Use this file to discover all available pages before exploring further.
RLlibEnvWrapper class, which subclasses MultiAgentEnv and exposes separate observation and action spaces for worker agents and the social planner.
Installation
The two-level curriculum experiments in the AI Economist paper were run on a 16-CPU, 60 GB machine on Google Cloud Platform (
n1-standard-16) with 15 rollout workers and 1 trainer worker.Environment wrapper
TheRLlibEnvWrapper in tutorials/rllib/env_wrapper.py wraps any Foundation environment to be compatible with RLlib’s MultiAgentEnv interface. It handles observation and action space construction for both agents and the planner.
Action space and multi_action_mode
The multi_action_mode environment parameter controls how each agent’s action space is structured:
multi_action_mode_planner. Set these in your environment configuration YAML:
Registering the environment and launching training
The training script intutorials/rllib/training_script.py sets up a PPO trainer using the RLlibEnvWrapper as the environment.
config.yaml:
~/ray_results. Checkpoints and dense logs go into phase1/ckpts/ and phase1/dense_logs/.
Two-level curriculum training
The two-level curriculum approach from The AI Economist paper staggers agent and planner learning to stabilize training in the non-stationary multi-agent environment.Phase one — agents only
Disable taxes
Set
disable_taxes: true on the PeriodicBracketTax component. This trains workers in a free market so they develop robust labor and trading policies before tax dynamics are introduced.Anneal labor costs
Use
energy_warmup_constant and energy_warmup_method to gradually ramp up labor costs. This prevents early convergence to a do-nothing policy caused by high labor costs with low rewards.Phase two — agents and planner
Restore agent weights from phase one
Set
restore_tf_weights_agents in the general section to the checkpoint path produced at the end of phase one:Enable tax annealing
Add
tax_annealing_schedule to PeriodicBracketTax to prevent the planner from setting destructive tax rates during early exploration:Schedule planner entropy
Use
entropy_coeff_schedule for the planner policy to keep entropy high initially, giving agents time to adapt to diverse tax settings before the planner begins to optimize:Custom models
Thetutorials/rllib/tf_models.py file provides two registered TensorFlow models for use with RLlib:
| Model name | Description |
|---|---|
random | Samples actions uniformly at random. Used for the planner during phase one. |
keras_conv_lstm | Combines convolutional layers (spatial), fully-connected layers (non-spatial), and an LSTM (historical) for structured observations. Used in the paper. |
Visualizing results
The full interactive tutorial is available on Colab: multi_agent_training_with_rllib.ipynb