Creating custom scenarios - AI Economist / Foundation

A scenario defines the world that agents live in: the physical layout, agent starting conditions, passive dynamics (e.g. resource regeneration), and reward functions. It is the outermost container that stitches together world state, components, and agents into an environment that follows the Gym-style reset / step interface. Create a custom scenario when the existing scenarios do not match your research question—for example, a new geography, a different reward structure, or passive dynamics that no existing scenario implements.

How scenarios relate to components

Scenarios and components have complementary roles:

Responsibility	Handled by
World layout (map, resources, landmarks)	Scenario
Agent starting state (inventory, location)	Scenario
Passive world dynamics (resource regrowth, weather)	Scenario
Observations and reward	Scenario
Agent action spaces	Components
Action-driven dynamics	Components

A scenario lists the components it expects in component_set and passes them as the components argument when instantiating the environment.

Required abstract methods

Every BaseEnvironment subclass must implement five abstract methods. The signatures below come directly from base_env.py:

reset_starting_layout()

@abstractmethod
def reset_starting_layout(self):
    """
    Part 1/2 of scenario reset. This method handles resetting the state of the
    environment managed by the scenario (i.e. resource & landmark layout).
    """

Called first in every reset cycle. Responsible for placing resources and landmarks on the world map. Use self.world.maps to write to the spatial state.

reset_agent_states()

@abstractmethod
def reset_agent_states(self):
    """
    Part 2/2 of scenario reset. This method handles resetting the state of the
    agents themselves (i.e. inventory, locations, etc.).
    """

Called immediately after reset_starting_layout(). Clear inventories, escrow, and endogenous quantities, then place agents at their starting locations.

scenario_step()

@abstractmethod
def scenario_step(self):
    """
    Update the state of the world according to whatever rules this scenario
    implements.

    This gets called in the 'step' method (of base_env) after going through each
    component step and before generating observations, rewards, etc.

    This is where things like resource regeneration, income redistribution, etc.,
    can be implemented.
    """

Called at every timestep after all component steps have run. Implement passive world dynamics here. May be a no-op if all dynamics are handled by components.

generate_observations()

@abstractmethod
def generate_observations(self):
    """
    Generate observations associated with this scenario.

    A scenario does not need to produce observations and can provide observations
    for only some agent types; however, for a given agent type, it should either
    always or never yield an observation. If it does yield an observation,
    that observation should always have the same structure/sizes!

    Returns:
        obs (dict): A dictionary of {agent.idx: agent_obs_dict}.
    """

Return a dict of {agent_idx: obs_dict}. Observations from the scenario are prefixed with "world-" when merged into the full observation bundle. The structure must remain the same for every call within an episode.

compute_reward()

@abstractmethod
def compute_reward(self):
    """
    Apply the reward function(s) associated with this scenario to get the rewards
    from this step.

    Returns:
        rew (dict): A dictionary of {agent.idx: scalar_reward} with an entry
            for each agent in the environment (including the planner).
    """

Called once per timestep to compute scalar rewards for every agent, including the planner. Return 0.0 for agents whose objectives are not modelled.

Step-by-step: implementing a custom scenario

Set class attributes and declare entities

from ai_economist.foundation.base.base_env import BaseEnvironment, scenario_registry

@scenario_registry.add
class CoinHarvest(BaseEnvironment):
    """
    Agents roam a map, harvest Coin, and are rewarded for their total holdings.
    """
    name = "coin-harvest"
    agent_subclasses = ["BasicMobileAgent", "BasicPlanner"]
    required_entities = ["Coin"]  # resources, landmarks, or endogenous names

required_entities lists the world entities the scenario depends on. Coin and Labor are always present by default. Additional resources, landmarks, and endogenous variables are registered from this list and from each included component’s required_entities.

Define __init__ with env_config kwargs

Accept scenario-specific parameters before forwarding to super().__init__():

def __init__(
    self,
    *base_env_args,
    starting_coin=0,
    coin_density=0.1,
    **base_env_kwargs
):
    super().__init__(*base_env_args, **base_env_kwargs)
    self.starting_coin = float(starting_coin)
    self.coin_density = float(coin_density)
    assert 0.0 < self.coin_density <= 1.0

After super().__init__() you can access:

self.world — World object wrapping the map, agents, and planner
self.world.agents — list of mobile agent objects
self.world.planner — the planner agent object
self.n_agents — number of mobile agents
self.world_size — [height, width] of the map
self.episode_length — timesteps per episode

Implement reset_starting_layout

Populate the world map with resources and landmarks.

def reset_starting_layout(self):
    # world.maps.get(resource) returns an H x W array
    # world.maps.set(resource, array) writes back to the map
    h, w = self.world_size
    coin_map = (np.random.rand(h, w) < self.coin_density).astype(np.float32)
    self.world.maps.set("Coin", coin_map)

For a scenario with no spatial resources (like OneStepEconomy), leave this as a no-op:

def reset_starting_layout(self):
    pass

Implement reset_agent_states

Clear agent state and set initial conditions.

def reset_agent_states(self):
    self.world.clear_agent_locs()  # remove agents from the map

    for agent in self.world.agents:
        agent.state["inventory"] = {
            k: 0 for k in agent.state["inventory"]
        }
        agent.state["escrow"] = {
            k: 0 for k in agent.state["escrow"]
        }
        agent.state["endogenous"] = {
            k: 0 for k in agent.state["endogenous"]
        }
        # Give each agent their starting coin
        agent.state["inventory"]["Coin"] = self.starting_coin

    # Place agents at random accessible locations
    for agent in self.world.agents:
        r, c = self.world.get_random_unoccupied_loc()
        self.world.set_agent_loc(agent, r, c)

Implement scenario_step

Define passive world dynamics that run every timestep regardless of agent actions.

def scenario_step(self):
    # Example: slowly replenish Coin on empty tiles
    h, w = self.world_size
    regen_mask = (
        (self.world.maps.get("Coin") == 0)
        & (np.random.rand(h, w) < 0.01)
    )
    coin_map = self.world.maps.get("Coin")
    coin_map[regen_mask] = 1.0
    self.world.maps.set("Coin", coin_map)

Leave as a no-op if all dynamics come from components:

def scenario_step(self):
    pass

Implement generate_observations

Produce scenario-level observations. These are prefixed with "world-" when merged.

def generate_observations(self):
    obs = {}
    for agent in self.world.agents:
        obs[str(agent.idx)] = {
            "coin_inventory": agent.state["inventory"]["Coin"] * self.inv_scale,
        }
    obs[self.world.planner.idx] = {
        "total_coin": sum(
            a.total_endowment("Coin") for a in self.world.agents
        ) * self.inv_scale,
    }
    return obs

self.inv_scale is 0.01 when allow_observation_scaling=True and 1 otherwise. Apply it to inventory quantities to keep observation values in a RL-friendly range.

Implement compute_reward

Return a scalar reward for every agent (including the planner) at every timestep.

def compute_reward(self):
    rew = {}
    for agent in self.world.agents:
        rew[str(agent.idx)] = float(
            agent.state["inventory"]["Coin"]
        )
    rew[self.world.planner.idx] = float(
        sum(a.total_endowment("Coin") for a in self.world.agents)
    )
    return rew

To implement episodic (terminal) rewards, accumulate a running metric during scenario_step() and emit a non-zero reward only on the final timestep (self.world.timestep == self.episode_length).

Add optional customization hooks

Two optional methods let you extend reset and metric reporting:

def additional_reset_steps(self):
    """
    Called last in the reset cycle, after reset_starting_layout,
    reset_agent_states, and all component resets.
    """
    # e.g. pre-compute derived quantities used later
    self._prev_coin = {
        str(a.idx): a.total_endowment("Coin") for a in self.world.agents
    }

def scenario_metrics(self):
    """
    Return {"key": scalar} metrics merged into env.metrics at episode end.
    """
    coin_endowments = [
        a.total_endowment("Coin") for a in self.world.agents
    ]
    return {
        "social/mean_coin": float(np.mean(coin_endowments)),
        "social/max_coin": float(np.max(coin_endowments)),
    }

Registering the scenario

from ai_economist.foundation.base.base_env import BaseEnvironment, scenario_registry

@scenario_registry.add
class CoinHarvest(BaseEnvironment):
    name = "coin-harvest"
    ...

The registry key is scenario.name (case-insensitive). The scenario is exposed as:

import coin_harvest  # trigger registration
import ai_economist.foundation as foundation

ScenarioClass = foundation.scenarios.get("coin-harvest")

Add an import of your scenario module to ai_economist/foundation/scenarios/__init__.py (or import it manually) so that the class is registered before foundation.scenarios.get() is called.

Instantiating the environment

Pass component specs as a list of ("ComponentName", {kwargs}) tuples:

import numpy as np
import coin_harvest  # registers CoinHarvest
import ai_economist.foundation as foundation

env = foundation.scenarios.get("coin-harvest")(
    components=[
        ("Gather", {"move_labor": 1.0, "collect_labor": 1.0}),
    ],
    n_agents=4,
    world_size=[15, 15],
    episode_length=500,
    multi_action_mode_agents=False,
    multi_action_mode_planner=True,
    flatten_observations=True,
    flatten_masks=True,
    allow_observation_scaling=True,
    starting_coin=0,       # scenario-specific kwarg
    coin_density=0.05,     # scenario-specific kwarg
)

obs = env.reset()
for _ in range(env.episode_length):
    actions = {agent.idx: agent.action_spaces for agent in env.all_agents}
    obs, rew, done, info = env.step(actions)
    if done["__all__"]:
        break

OneStepEconomy as a reference

The built-in OneStepEconomy scenario (ai_economist/foundation/scenarios/one_step_economy/one_step_economy.py) is an ideal minimal reference. It has a no-op reset_starting_layout, a clean reset_agent_states, a no-op scenario_step, and well-documented generate_observations and compute_reward implementations.

Documentation Index

​How scenarios relate to components

​Required abstract methods

​Step-by-step: implementing a custom scenario

​Registering the scenario

​Instantiating the environment

​OneStepEconomy as a reference

How scenarios relate to components

Required abstract methods

Step-by-step: implementing a custom scenario

Registering the scenario

Instantiating the environment

OneStepEconomy as a reference