One-Step Economy

The One-Step Economy is a deliberately minimal scenario that distills the essential tax-and-labor dynamics of the full Gather-Trade-Build simulation into just two timesteps. It is designed for rapid reinforcement learning experimentation and theoretical analysis of optimal tax policy.

How it works

The scenario runs with an episode_length of 2:

Step 1 — Tax setting: The planner agent sets marginal tax bracket rates via the PeriodicBracketTax component.
Step 2 — Labor selection: Each mobile agent selects how much labor to supply via the SimpleLabor component. Each agent’s optimal labor depends on its skill level and the tax rates, but not on the choices of other agents.

Because agents make analytically tractable decisions, this scenario is well suited to studying how tax schedules affect labor supply and income distribution without the confounding dynamics of resource gathering, trading, or spatial navigation.

Scenario name

one-step-economy

Defined in:

ai_economist/foundation/scenarios/one_step_economy/one_step_economy.py

The registered class is OneStepEconomy, which extends BaseEnvironment. It uses:

Agent types: BasicMobileAgent, BasicPlanner
Required entities: Coin

Intended components

This scenario is designed to be paired with:

Component	Role
`PeriodicBracketTax`	Planner sets marginal tax rates at the start of each period
`SimpleLabor`	Agents choose a discrete labor level; income = `skill × labor`

Key parameters

Parameter	Type	Default	Description
`agent_reward_type`	str	`"coin_minus_labor_cost"`	Utility function for mobile agents. Options: `"coin_minus_labor_cost"`, `"isoelastic_coin_minus_labor"`
`isoelastic_eta`	float	`0.23`	Shape parameter for the isoelastic utility function (used when `agent_reward_type="isoelastic_coin_minus_labor"`)
`labor_exponent`	float	`2.0`	Exponent in the `"coin_minus_labor_cost"` utility function
`labor_cost`	float	`1.0`	Coefficient weighting the cost of labor
`planner_reward_type`	str	`"inv_income_weighted_utility"`	Social welfare function for the planner. Options: `"inv_income_weighted_utility"`, `"coin_eq_times_productivity"`, `"inv_income_weighted_coin_endowment"`
`mixing_weight_gini_vs_coin`	float	`0`	Weight on productivity vs. equality for `"coin_eq_times_productivity"` (0 = equal, 1 = productivity only)

Instantiating the environment

import ai_economist.foundation as foundation

env_config = {
    "scenario_name": "one-step-economy",

    # Must be 2: step 1 = tax setting, step 2 = labor choice
    "episode_length": 2,
    "n_agents": 10,

    # Agent reward
    "agent_reward_type": "coin_minus_labor_cost",
    "labor_exponent": 2.0,
    "labor_cost": 1.0,

    # Planner reward
    "planner_reward_type": "inv_income_weighted_utility",

    # Components
    "components": [
        {"PeriodicBracketTax": {
            "period": 1,
            "bracket_spacing": "us-federal",
            "usd_scaling": 1000.0,
            "disable_taxes": False
        }},
        {"SimpleLabor": {
            "mask_first_step": True,
            "payment_max_skill_multiplier": 3,
            "pareto_param": 4.0,
        }},
    ],
}

env = foundation.make_env_instance(**env_config)
obs = env.reset()

# Step through one episode
done = False
while not done:
    actions = {}  # supply actions for each agent
    obs, rewards, done, info = env.step(actions)

Observations

On each step, agents observe:

Their inventory (Coin).
Per-agent: normalized_per_capita_productivity and equality metrics (available to the planner).
The planner observes aggregate coin endowments and the derived equality and productivity metrics.

From generate_observations in one_step_economy.py:

coin_endowments = np.array(
    [agent.total_endowment("Coin") for agent in self.world.agents]
)
equality = social_metrics.get_equality(coin_endowments)
productivity = social_metrics.get_productivity(coin_endowments)
normalized_per_capita_productivity = productivity / self.num_agents / 1000

Reward structure

Mobile agents receive utility based on agent_reward_type:

"coin_minus_labor_cost": coin - labor_cost * labor^labor_exponent
"isoelastic_coin_minus_labor": isoelastic utility over coin minus labor cost, shaped by isoelastic_eta

The planner receives social welfare according to planner_reward_type:

Type	Description
`"inv_income_weighted_utility"`	Weighted average of agent utilities, with higher weight on lower-income agents
`"coin_eq_times_productivity"`	Product of equality (1 − Gini) and total productivity
`"inv_income_weighted_coin_endowment"`	Inverse-income-weighted average coin endowment

Use in research

This scenario was introduced in:

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning (arXiv:2108.02755)
Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, Richard Socher.

It enables tractable two-level RL experiments where the planner’s optimal tax policy can be analyzed analytically alongside learned policies.

For training with RLlib, see the two-level curriculum training guide which uses this scenario to demonstrate curriculum learning.

Get Started

Core Concepts

Simulations

Training with RL

Extending Foundation

How it works

Scenario name

Intended components

Key parameters

Instantiating the environment

Observations

Reward structure

Use in research

Build docs developers (and LLMs) love

Get Started

Core Concepts

Simulations

Training with RL

Extending Foundation

Documentation Index

​How it works

​Scenario name

​Intended components

​Key parameters

​Instantiating the environment

​Observations

​Reward structure

​Use in research

Build docs developers (and LLMs) love

How it works

Scenario name

Intended components

Key parameters

Instantiating the environment

Observations

Reward structure

Use in research