reset() returns a dictionary of observations keyed by agent index:
obs = env.reset()# Observation keysprint(list(obs.keys()))# ["0", "1", "2", "3", "p"] # mobile agents + planner# Each agent's observation is a dict of named arraysprint(list(obs["0"].keys()))# ["loc", "inventory", "flat", ...]
2
Inspect agents
Access agents directly from the environment:
# All agents (mobile + planner)for agent in env.all_agents: print(agent.idx, type(agent).__name__)# 0 BasicMobileAgent# 1 BasicMobileAgent# 2 BasicMobileAgent# 3 BasicMobileAgent# p BasicPlanner# Access by indexagent_0 = env.get_agent("0")planner = env.get_agent("p")print(agent_0.state["inventory"]) # {"Wood": 0, "Stone": 0, "Coin": 0}print(agent_0.state["loc"]) # [row, col]
3
Step the simulation
Pass an action dict keyed by agent index. Each value is the action index (or a dict in multi_action_mode):
import numpy as np# Sample random actions for all agents.# agent.action_spaces returns an int (single-action mode) — the total number of# available actions. Pass a random integer from 0 to action_spaces - 1.actions = { agent.idx: np.random.randint(0, agent.action_spaces) for agent in env.all_agents}obs, rew, done, info = env.step(actions)# rew is a dict of rewards keyed by agent indexprint(rew)# {"0": 0.12, "1": 0.07, "2": 0.15, "3": 0.09, "p": 0.0}# done is True when the episode endsprint(done) # False
4
Run a full episode
import numpy as npobs = env.reset()total_rewards = {agent.idx: 0.0 for agent in env.all_agents}for t in range(env.episode_length): actions = {agent.idx: np.random.randint(0, agent.action_spaces) for agent in env.all_agents} obs, rew, done, info = env.step(actions) for agent_idx, r in rew.items(): total_rewards[agent_idx] += r if done: breakprint("Episode rewards:", total_rewards)