Real Business Cycle - AI Economist / Foundation

The Real Business Cycle (RBC) simulation implements a many-agent macroeconomic environment with heterogeneous consumers, firms, and a government. The simulation core is written in CUDA C for high-throughput GPU execution, and training is powered by PyTorch policy networks.

A GPU is required to run these experiments. The CUDA simulation code does not fall back to CPU.

What the simulation models

The RBC simulation extends classical real business cycle theory to a multi-agent setting with strategic interactions:

Consumers choose how much to work and how much of each firm’s good to consume, subject to a budget constraint. Each consumer has a private theta parameter (disutility of work).
Firms set prices, wages, and capital investment. Production follows a Cobb-Douglas function parameterized by a firm-specific alpha.
Government sets income and corporate tax rates to influence economic outcomes.

Agents observe a shared global state plus agent-type-specific private state variables.

Global state layout

The global state has dimension 4 * num_firms + 2 * num_governments + 1:

Dimension	Content
`num_firms`	Current prices (one per firm)
`num_firms`	Current wages (one per firm)
`num_firms`	Inventory / stock levels (one per firm)
`num_firms`	Over-demanded flag (one per firm)
`2 * num_governments`	Income and corporate tax rates
`1`	Time index

Per-agent private state extensions:

Agent type	Additional dimensions
Consumer	`+ 2`: budget, theta (disutility of work)
Firm	`+ 3 + num_firms`: budget, capital, production alpha, one-hot firm identity
Government	No additional dimensions beyond global state

Directory structure

ai_economist/real_business_cycle/
├── train_single_exp.py      # Train a single experiment configuration
├── train_multi_exps.py      # Hyperparameter sweep over multiple experiments
├── train_bestresponse.py    # Approximate best-response training against checkpointed policies
├── experiment_utils.py      # Shared training utilities
└── rbc/
    ├── constants.py         # Configuration dictionaries and action discretizations
    ├── cuda_manager.py      # CUDA environment wrapper and data structure management
    ├── networks.py          # PyTorch policy network definitions
    ├── util.py              # Miscellaneous utilities
    └── cuda/                # CUDA C simulation kernels

Dependencies

pip install torch>=1.9.0 pycuda==2021.1 matplotlib==3.2.1

Running experiments

Single experiment

Train one configuration defined in constants.py:

python train_single_exp.py

Hyperparameter sweep

Launch a Cartesian-product sweep over the parameter grids defined in train_multi_exps.py:

python train_multi_exps.py

Parameter sweeps are defined as *_param_sweeps dictionaries in the file. Each hyperparameter takes a list of one or more values; all combinations are run.

Approximate best-response training

Train one agent type against fixed checkpointed policies from a previous run:

python train_bestresponse.py ROLLOUT_DIR NUM_EPISODES_TO_TRAIN \
    --ep-strs ep1 ep2 \
    --agent-type all

--ep-strs specifies which saved episode checkpoints to load (e.g. policies saved at episodes 0, 10000, and 200000). --agent-type can target a single agent type or all.

Configuration dictionaries

Experiment configuration is managed through Python dictionaries with three top-level keys:

from ai_economist.real_business_cycle.rbc.constants import all_agents_export_experiment_template

# Build a config dict for a simulation with 3 firms, 100 consumers, 1 government
cfg = all_agents_export_experiment_template(
    NUMFIRMS=3,
    NUMCONSUMERS=100,
    NUMGOVERNMENTS=1,
    episodes_const=30000,
)

# cfg contains:
# cfg["agents"]  -- num_consumers, num_firms, action discretizations, state dims
# cfg["world"]   -- episode length, batch size, simulation parameters
# cfg["train"]   -- learning rate, save frequency, annealing schedules

The configuration is written to hparams.yaml in the job output directory at the start of training.

Action discretizations

Actions are discrete indices mapped to real-valued choices at runtime:

Agent	Action heads	Example choices
Consumer	Consumption per firm, work hours	Work: `{0, 260, 520, 780, 1040}` hours
Firm	Price, wage, capital investment	Prices: `{0, 500, 1000, 1500, 2000, 2500}`
Government	Income tax rate, corporate tax rate	Each: `{0.0, 0.2, 0.4, 0.6, 0.8, 1.0}`

Action arrays are saved to action_arrays.pickle in each run directory so that index-to-value mappings are preserved alongside saved policies.

Output files

Each training run produces a rollout-XXXXXX-XXXXX/ directory:

rollout-999999-99999/
├── hparams.yaml                      # Hyperparameters for this run
├── action_arrays.pickle              # Action index → value mapping
├── episode_XXXX_consumer.npz         # Dense rollout arrays for consumers
├── episode_XXXX_firm.npz             # Dense rollout arrays for firms
├── episode_XXXX_government.npz       # Dense rollout arrays for government
├── saved_models/
│   ├── consumer_policy_XXX.pt        # PyTorch state dict
│   ├── firm_policy_XXX.pt
│   └── government_policy_XXX.pt
├── brconsumer/                       # Best-response results for consumers
├── brfirm/                           # Best-response results for firms
└── brgovernment/                     # Best-response results for government

Each .npz rollout file has keys states, actions, rewards, action_array, and aux_array. Array shapes:

Array	Shape
`states`	`(batch_size, ep_length, num_agents, state_dim)`
`actions`	`(batch_size, ep_length, num_agents)` or `(batch_size, ep_length, num_agents, num_action_heads)` for consumers
`rewards`	`(batch_size, ep_length, num_agents)`

Saving frequency

The default configuration saves dense rollouts frequently. To reduce disk usage, set train.save_dense_every in the configuration dictionary to a larger integer before starting the run.

Reference

For details on the model and results, see:

Finding General Equilibria in Many-Agent Economic Simulations using Deep Reinforcement Learning
(ArXiv link forthcoming as of this writing)

Documentation Index

​What the simulation models

​Global state layout

​Directory structure

​Dependencies

​Running experiments

​Configuration dictionaries

​Action discretizations

​Output files

​Saving frequency

​Reference

What the simulation models

Global state layout

Directory structure

Dependencies

Running experiments

Configuration dictionaries

Action discretizations

Output files

Saving frequency

Reference