Use this file to discover all available pages before exploring further.
A scenario defines the world that agents live in: the physical layout, agent starting conditions, passive dynamics (e.g. resource regeneration), and reward functions. It is the outermost container that stitches together world state, components, and agents into an environment that follows the Gym-style reset / step interface.Create a custom scenario when the existing scenarios do not match your research question—for example, a new geography, a different reward structure, or passive dynamics that no existing scenario implements.
Every BaseEnvironment subclass must implement five abstract methods. The signatures below come directly from base_env.py:
reset_starting_layout()
@abstractmethoddef reset_starting_layout(self): """ Part 1/2 of scenario reset. This method handles resetting the state of the environment managed by the scenario (i.e. resource & landmark layout). """
Called first in every reset cycle. Responsible for placing resources and landmarks on the world map. Use self.world.maps to write to the spatial state.
reset_agent_states()
@abstractmethoddef reset_agent_states(self): """ Part 2/2 of scenario reset. This method handles resetting the state of the agents themselves (i.e. inventory, locations, etc.). """
Called immediately after reset_starting_layout(). Clear inventories, escrow, and endogenous quantities, then place agents at their starting locations.
scenario_step()
@abstractmethoddef scenario_step(self): """ Update the state of the world according to whatever rules this scenario implements. This gets called in the 'step' method (of base_env) after going through each component step and before generating observations, rewards, etc. This is where things like resource regeneration, income redistribution, etc., can be implemented. """
Called at every timestep after all component steps have run. Implement passive world dynamics here. May be a no-op if all dynamics are handled by components.
generate_observations()
@abstractmethoddef generate_observations(self): """ Generate observations associated with this scenario. A scenario does not need to produce observations and can provide observations for only some agent types; however, for a given agent type, it should either always or never yield an observation. If it does yield an observation, that observation should always have the same structure/sizes! Returns: obs (dict): A dictionary of {agent.idx: agent_obs_dict}. """
Return a dict of {agent_idx: obs_dict}. Observations from the scenario are prefixed with "world-" when merged into the full observation bundle. The structure must remain the same for every call within an episode.
compute_reward()
@abstractmethoddef compute_reward(self): """ Apply the reward function(s) associated with this scenario to get the rewards from this step. Returns: rew (dict): A dictionary of {agent.idx: scalar_reward} with an entry for each agent in the environment (including the planner). """
Called once per timestep to compute scalar rewards for every agent, including the planner. Return 0.0 for agents whose objectives are not modelled.
from ai_economist.foundation.base.base_env import BaseEnvironment, scenario_registry@scenario_registry.addclass CoinHarvest(BaseEnvironment): """ Agents roam a map, harvest Coin, and are rewarded for their total holdings. """ name = "coin-harvest" agent_subclasses = ["BasicMobileAgent", "BasicPlanner"] required_entities = ["Coin"] # resources, landmarks, or endogenous names
required_entities lists the world entities the scenario depends on. Coin and Labor are always present by default. Additional resources, landmarks, and endogenous variables are registered from this list and from each included component’s required_entities.
2
Define __init__ with env_config kwargs
Accept scenario-specific parameters before forwarding to super().__init__():
self.world — World object wrapping the map, agents, and planner
self.world.agents — list of mobile agent objects
self.world.planner — the planner agent object
self.n_agents — number of mobile agents
self.world_size — [height, width] of the map
self.episode_length — timesteps per episode
3
Implement reset_starting_layout
Populate the world map with resources and landmarks.
def reset_starting_layout(self): # world.maps.get(resource) returns an H x W array # world.maps.set(resource, array) writes back to the map h, w = self.world_size coin_map = (np.random.rand(h, w) < self.coin_density).astype(np.float32) self.world.maps.set("Coin", coin_map)
For a scenario with no spatial resources (like OneStepEconomy), leave this as a no-op:
def reset_starting_layout(self): pass
4
Implement reset_agent_states
Clear agent state and set initial conditions.
def reset_agent_states(self): self.world.clear_agent_locs() # remove agents from the map for agent in self.world.agents: agent.state["inventory"] = { k: 0 for k in agent.state["inventory"] } agent.state["escrow"] = { k: 0 for k in agent.state["escrow"] } agent.state["endogenous"] = { k: 0 for k in agent.state["endogenous"] } # Give each agent their starting coin agent.state["inventory"]["Coin"] = self.starting_coin # Place agents at random accessible locations for agent in self.world.agents: r, c = self.world.get_random_unoccupied_loc() self.world.set_agent_loc(agent, r, c)
5
Implement scenario_step
Define passive world dynamics that run every timestep regardless of agent actions.
Leave as a no-op if all dynamics come from components:
def scenario_step(self): pass
6
Implement generate_observations
Produce scenario-level observations. These are prefixed with "world-" when merged.
def generate_observations(self): obs = {} for agent in self.world.agents: obs[str(agent.idx)] = { "coin_inventory": agent.state["inventory"]["Coin"] * self.inv_scale, } obs[self.world.planner.idx] = { "total_coin": sum( a.total_endowment("Coin") for a in self.world.agents ) * self.inv_scale, } return obs
self.inv_scale is 0.01 when allow_observation_scaling=True and 1 otherwise. Apply it to inventory quantities to keep observation values in a RL-friendly range.
7
Implement compute_reward
Return a scalar reward for every agent (including the planner) at every timestep.
def compute_reward(self): rew = {} for agent in self.world.agents: rew[str(agent.idx)] = float( agent.state["inventory"]["Coin"] ) rew[self.world.planner.idx] = float( sum(a.total_endowment("Coin") for a in self.world.agents) ) return rew
To implement episodic (terminal) rewards, accumulate a running metric during scenario_step() and emit a non-zero reward only on the final timestep (self.world.timestep == self.episode_length).
8
Add optional customization hooks
Two optional methods let you extend reset and metric reporting:
def additional_reset_steps(self): """ Called last in the reset cycle, after reset_starting_layout, reset_agent_states, and all component resets. """ # e.g. pre-compute derived quantities used later self._prev_coin = { str(a.idx): a.total_endowment("Coin") for a in self.world.agents }def scenario_metrics(self): """ Return {"key": scalar} metrics merged into env.metrics at episode end. """ coin_endowments = [ a.total_endowment("Coin") for a in self.world.agents ] return { "social/mean_coin": float(np.mean(coin_endowments)), "social/max_coin": float(np.max(coin_endowments)), }
from ai_economist.foundation.base.base_env import BaseEnvironment, scenario_registry@scenario_registry.addclass CoinHarvest(BaseEnvironment): name = "coin-harvest" ...
The registry key is scenario.name (case-insensitive). The scenario is exposed as:
import coin_harvest # trigger registrationimport ai_economist.foundation as foundationScenarioClass = foundation.scenarios.get("coin-harvest")
Add an import of your scenario module to ai_economist/foundation/scenarios/__init__.py (or import it manually) so that the class is registered before foundation.scenarios.get() is called.
The built-in OneStepEconomy scenario (ai_economist/foundation/scenarios/one_step_economy/one_step_economy.py) is an ideal minimal reference. It has a no-op reset_starting_layout, a clean reset_agent_states, a no-op scenario_step, and well-documented generate_observations and compute_reward implementations.