Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/salesforce/ai-economist/llms.txt

Use this file to discover all available pages before exploring further.

The One-Step Economy is a deliberately minimal scenario that distills the essential tax-and-labor dynamics of the full Gather-Trade-Build simulation into just two timesteps. It is designed for rapid reinforcement learning experimentation and theoretical analysis of optimal tax policy.

How it works

The scenario runs with an episode_length of 2:
  1. Step 1 — Tax setting: The planner agent sets marginal tax bracket rates via the PeriodicBracketTax component.
  2. Step 2 — Labor selection: Each mobile agent selects how much labor to supply via the SimpleLabor component. Each agent’s optimal labor depends on its skill level and the tax rates, but not on the choices of other agents.
Because agents make analytically tractable decisions, this scenario is well suited to studying how tax schedules affect labor supply and income distribution without the confounding dynamics of resource gathering, trading, or spatial navigation.

Scenario name

one-step-economy
Defined in:
ai_economist/foundation/scenarios/one_step_economy/one_step_economy.py
The registered class is OneStepEconomy, which extends BaseEnvironment. It uses:
  • Agent types: BasicMobileAgent, BasicPlanner
  • Required entities: Coin

Intended components

This scenario is designed to be paired with:
ComponentRole
PeriodicBracketTaxPlanner sets marginal tax rates at the start of each period
SimpleLaborAgents choose a discrete labor level; income = skill × labor

Key parameters

ParameterTypeDefaultDescription
agent_reward_typestr"coin_minus_labor_cost"Utility function for mobile agents. Options: "coin_minus_labor_cost", "isoelastic_coin_minus_labor"
isoelastic_etafloat0.23Shape parameter for the isoelastic utility function (used when agent_reward_type="isoelastic_coin_minus_labor")
labor_exponentfloat2.0Exponent in the "coin_minus_labor_cost" utility function
labor_costfloat1.0Coefficient weighting the cost of labor
planner_reward_typestr"inv_income_weighted_utility"Social welfare function for the planner. Options: "inv_income_weighted_utility", "coin_eq_times_productivity", "inv_income_weighted_coin_endowment"
mixing_weight_gini_vs_coinfloat0Weight on productivity vs. equality for "coin_eq_times_productivity" (0 = equal, 1 = productivity only)

Instantiating the environment

import ai_economist.foundation as foundation

env_config = {
    "scenario_name": "one-step-economy",

    # Must be 2: step 1 = tax setting, step 2 = labor choice
    "episode_length": 2,
    "n_agents": 10,

    # Agent reward
    "agent_reward_type": "coin_minus_labor_cost",
    "labor_exponent": 2.0,
    "labor_cost": 1.0,

    # Planner reward
    "planner_reward_type": "inv_income_weighted_utility",

    # Components
    "components": [
        {"PeriodicBracketTax": {
            "period": 1,
            "bracket_spacing": "us-federal",
            "usd_scaling": 1000.0,
            "disable_taxes": False
        }},
        {"SimpleLabor": {
            "mask_first_step": True,
            "payment_max_skill_multiplier": 3,
            "pareto_param": 4.0,
        }},
    ],
}

env = foundation.make_env_instance(**env_config)
obs = env.reset()

# Step through one episode
done = False
while not done:
    actions = {}  # supply actions for each agent
    obs, rewards, done, info = env.step(actions)

Observations

On each step, agents observe:
  • Their inventory (Coin).
  • Per-agent: normalized_per_capita_productivity and equality metrics (available to the planner).
  • The planner observes aggregate coin endowments and the derived equality and productivity metrics.
From generate_observations in one_step_economy.py:
coin_endowments = np.array(
    [agent.total_endowment("Coin") for agent in self.world.agents]
)
equality = social_metrics.get_equality(coin_endowments)
productivity = social_metrics.get_productivity(coin_endowments)
normalized_per_capita_productivity = productivity / self.num_agents / 1000

Reward structure

Mobile agents receive utility based on agent_reward_type:
  • "coin_minus_labor_cost": coin - labor_cost * labor^labor_exponent
  • "isoelastic_coin_minus_labor": isoelastic utility over coin minus labor cost, shaped by isoelastic_eta
The planner receives social welfare according to planner_reward_type:
TypeDescription
"inv_income_weighted_utility"Weighted average of agent utilities, with higher weight on lower-income agents
"coin_eq_times_productivity"Product of equality (1 − Gini) and total productivity
"inv_income_weighted_coin_endowment"Inverse-income-weighted average coin endowment

Use in research

This scenario was introduced in:
The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning (arXiv:2108.02755)
Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, Richard Socher.
It enables tractable two-level RL experiments where the planner’s optimal tax policy can be analyzed analytically alongside learned policies.
For training with RLlib, see the two-level curriculum training guide which uses this scenario to demonstrate curriculum learning.

Build docs developers (and LLMs) love