EnvironmentWrapper - AI Economist / Foundation

FoundationEnvWrapper wraps a Foundation BaseEnvironment instance and decides whether environment reset and step operations run on the CPU or the GPU.

CPU mode (use_cuda=False): every reset() and step() call runs on the host using the standard Python environment.
GPU mode (use_cuda=True, requires WarpDrive): the first reset() runs on the CPU, copies all relevant data to the GPU, and all subsequent steps execute as CUDA kernels.

The wrapper also attaches observation_space and action_space attributes to the wrapped environment, making it compatible with RLlib and other Gym-style training frameworks.

from ai_economist.foundation.env_wrapper import FoundationEnvWrapper
from ai_economist import foundation

ScenarioClass = foundation.scenarios.get("uniform/simple_wood_and_stone")
env_obj = ScenarioClass(
    components=[("Build", {"payment": 20}), ("Gather", {})],
    n_agents=4,
    world_size=[25, 25],
)

wrapper = FoundationEnvWrapper(env_obj=env_obj)
obs = wrapper.reset()
obs, rew, done, info = wrapper.step(actions)

`recursive_obs_dict_to_spaces_dict()`

recursive_obs_dict_to_spaces_dict(obs: dict) -> gym.spaces.Dict

Module-level utility function that converts an observation dictionary (keyed by agent index) into a corresponding gym.spaces.Dict of observation spaces. Used internally by FoundationEnvWrapper.__init__ to populate env.observation_space. Each leaf value is converted according to these rules:

Python type	Gym space
`np.ndarray`	`Box(low=-BIG_NUMBER, high=BIG_NUMBER, shape=v.shape, dtype=v.dtype)`
`list`	Converted to `np.ndarray`, then treated as above
`int` / `float` / `np.integer` / `np.floating`	Wrapped in a 1-D `np.array`, then treated as above
`dict`	Recurses into a nested `spaces.Dict`

The BIG_NUMBER bound is 1e20, halved iteratively until the Box low/high values are numerically valid.

Parameters

obs

dict

required

A dictionary of observations keyed by agent index, as returned by a multi-agent Foundation environment’s reset() call.

Returns

spaces

gym.spaces.Dict

A gym.spaces.Dict whose structure mirrors the input observation dictionary, with every leaf replaced by an appropriate Box space.

`FoundationEnvWrapper`

Constructor

FoundationEnvWrapper(
    env_obj=None,
    env_name=None,
    env_config=None,
    num_envs=1,
    use_cuda=False,
    env_registrar=None,
    event_messenger=None,
    process_id=0,
)

You must supply either env_obj or the triple (env_name, env_config, env_registrar).

Parameters

env_obj

BaseEnvironment

default:"None"

A fully constructed Foundation environment instance. When provided, env_name, env_config, and env_registrar are ignored.

env_name

str

default:"None"

Name of an environment registered on the WarpDrive environment registrar. Required when env_obj is not provided.

env_config

dict

default:"None"

Keyword arguments passed to the environment class retrieved from env_registrar. Required when env_obj is not provided.

num_envs

int

default:"1"

Number of parallel environments to run simultaneously. Only relevant when use_cuda=True.

use_cuda

bool

default:"False"

When True, runs environment steps on the GPU via WarpDrive CUDA kernels. Requires a CUDA-capable GPU and the rl-warp-drive package.

env_registrar

EnvironmentRegistrar

default:"None"

WarpDrive EnvironmentRegistrar object that provides customized environment info (such as CUDA source paths) needed for the GPU build. Required when env_obj is not provided or when use_cuda=True.

event_messenger

multiprocessing.Event

default:"None"

Optional multiprocessing Event used to synchronise the CUDA build when multiple worker processes are launched simultaneously.

process_id

int

default:"0"

Integer identifier of the process running WarpDrive. Used to avoid build collisions in multi-process setups.

When use_cuda=True, the wrapped environment must already expose use_cuda, cuda_data_manager, and cuda_function_manager attributes, and so must env.world. A GPU must be available or the constructor raises an AssertionError.

What the constructor does

Attach the environment

Stores the environment as self.env. If env_obj is provided it is used directly; otherwise the environment is instantiated from the registrar.

Build observation space

Calls obs_at_reset() (which calls env.reset() once on the CPU) and passes the result to recursive_obs_dict_to_spaces_dict(). The resulting gym.spaces.Dict is stored on env.observation_space.

Build action space

Iterates over mobile agents and the planner. Each agent whose multi_action_mode is True receives a MultiDiscrete space; otherwise a Discrete space. All spaces are stored in env.action_space keyed by agent index string (e.g. "0", "1", …, "p").

GPU initialisation (use_cuda=True only)

Initialises CUDADataManager and CUDAFunctionManager, compiles and loads the CUDA kernels for the scenario step and each component step, and registers a CUDAEnvironmentReset resetter. All managers are attached to both self.env and self.env.world.

Properties

`observation_space`

wrapper.env.observation_space  # -> gym.spaces.Dict

A gym.spaces.Dict keyed by agent index strings (e.g. "0", "1", …, "p"). Built from the initial observation returned by the first reset() call. Attached to wrapper.env during construction.

`action_space`

wrapper.env.action_space  # -> dict[str, gym.spaces.Space]

Dictionary mapping agent index strings to their action spaces:

Discrete(n) when agent.multi_action_mode is False
MultiDiscrete([n1, n2, ...]) when agent.multi_action_mode is True

The planner is always keyed as "p". All spaces have dtype = np.int32.

Methods

`reset()`

wrapper.reset()

Gym-style alias for reset_all_envs(). Use this in CPU mode or when writing code that is agnostic to the execution backend.

Returns

obs

dict

Initial observation dictionary, identical in structure to what env.reset() returns (after reformatting collated agent observations).

`step()`

wrapper.step(actions=None)

Gym-style alias for step_all_envs(). In CPU mode, requires actions. In GPU mode, steps happen on the CUDA device and None is returned.

Parameters

actions

dict

default:"None"

Dictionary {agent_idx: action} of agent actions. Required in CPU mode. Ignored in GPU mode (actions are managed on-device).

Returns

result

tuple | None

In CPU mode: (obs, rew, done, info) tuple with the same structure as BaseEnvironment.step().In GPU mode: None — arrays are updated in place on the device.

`reset_all_envs()`

wrapper.reset_all_envs()

Full reset implementation.

If reset_on_host is True: calls obs_at_reset() on the CPU. In GPU mode this also copies all data tensors to the device (expanding along the environment dimension for num_envs > 1) and then sets reset_on_host = False so all future resets run on the GPU.
If reset_on_host is False (GPU mode only): invokes CUDAEnvironmentReset with mode="force_reset" and returns an empty dict.

Returns

obs

dict

Initial observation dictionary on the first (CPU) reset. An empty dictionary {} for subsequent GPU resets.

`reset_only_done_envs()`

wrapper.reset_only_done_envs()

GPU-only method. Selectively resets only those parallel environments whose done flag is True, leaving others running. Requires use_cuda=True and a completed first reset (reset_on_host=False).

Calling this method in CPU mode or before the first GPU reset raises an AssertionError.

Returns

obs

dict

Always returns an empty dictionary {} — done-env resets modify device arrays in place.

`step_all_envs()`

wrapper.step_all_envs(actions=None)

Full step implementation.

CPU mode: requires actions. Calls env.step(actions), reformats collated observations and rewards, and returns (obs, rew, done, info).
GPU mode: steps each component via component.component_step(), calls env.scenario_step(), then calls env.generate_rewards(). Returns None.

Parameters

actions

dict

default:"None"

Action dictionary. Must be provided in CPU mode; ignored in GPU mode.

Returns

result

tuple | None

(obs, rew, done, info) in CPU mode; None in GPU mode.

`obs_at_reset()`

wrapper.obs_at_reset()

Calls env.reset() on the CPU and applies _reformat_obs(). Used internally during construction to build the observation space and during the first GPU reset to populate device data.

Returns

obs

dict

Initial observation dictionary after reformatting.

GPU mode

GPU-accelerated simulation requires the rl-warp-drive package and a CUDA-capable GPU.

pip install rl-warp-drive

The wrapper detects available GPUs at import time using GPUtil. If no GPUs are found, all simulation falls back to CPU automatically.

wrapper = FoundationEnvWrapper(env_obj=env_obj, use_cuda=False)
obs = wrapper.reset()
obs, rew, done, info = wrapper.step(actions)

In GPU mode, step() returns None. Observations and rewards are written directly to device memory and must be read via the CUDADataManager — they are not returned as Python dicts.

Internal helper methods

`_reformat_obs()`

wrapper._reformat_obs(obs: dict) -> dict

If the environment uses collated observations (key "a" present), expands each agent’s slice into a separate str(agent_id) key and removes the "a" key. Returns the reformatted dictionary unchanged otherwise.

`_reformat_rew()`

wrapper._reformat_rew(rew: dict) -> dict

If the environment uses collated rewards (key "a" present), expands each agent’s scalar into a separate str(agent_id) key and removes the "a" key. Returns the dictionary unchanged otherwise.

Documentation Index

​recursive_obs_dict_to_spaces_dict()

​Parameters

​Returns

​FoundationEnvWrapper

​Constructor

​Parameters

​What the constructor does

​Properties

​observation_space

​action_space

​Methods

​reset()

​Returns

​step()

​Parameters

​Returns

​reset_all_envs()

​Returns

​reset_only_done_envs()

​Returns

​step_all_envs()

​Parameters

​Returns

​obs_at_reset()

​Returns

​GPU mode

​Internal helper methods

​_reformat_obs()

​_reformat_rew()

`recursive_obs_dict_to_spaces_dict()`

Parameters

Returns

`FoundationEnvWrapper`

Constructor

Parameters

What the constructor does

Properties

`observation_space`

`action_space`

Methods

`reset()`

Returns

`step()`

Parameters

Returns

`reset_all_envs()`

Returns

`reset_only_done_envs()`

Returns

`step_all_envs()`

Parameters

Returns

`obs_at_reset()`

Returns

GPU mode

Internal helper methods

`_reformat_obs()`

`_reformat_rew()`