Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/salesforce/ai-economist/llms.txt

Use this file to discover all available pages before exploring further.

The Real Business Cycle (RBC) simulation implements a many-agent macroeconomic environment with heterogeneous consumers, firms, and a government. The simulation core is written in CUDA C for high-throughput GPU execution, and training is powered by PyTorch policy networks.
A GPU is required to run these experiments. The CUDA simulation code does not fall back to CPU.

What the simulation models

The RBC simulation extends classical real business cycle theory to a multi-agent setting with strategic interactions:
  • Consumers choose how much to work and how much of each firm’s good to consume, subject to a budget constraint. Each consumer has a private theta parameter (disutility of work).
  • Firms set prices, wages, and capital investment. Production follows a Cobb-Douglas function parameterized by a firm-specific alpha.
  • Government sets income and corporate tax rates to influence economic outcomes.
Agents observe a shared global state plus agent-type-specific private state variables.

Global state layout

The global state has dimension 4 * num_firms + 2 * num_governments + 1:
DimensionContent
num_firmsCurrent prices (one per firm)
num_firmsCurrent wages (one per firm)
num_firmsInventory / stock levels (one per firm)
num_firmsOver-demanded flag (one per firm)
2 * num_governmentsIncome and corporate tax rates
1Time index
Per-agent private state extensions:
Agent typeAdditional dimensions
Consumer+ 2: budget, theta (disutility of work)
Firm+ 3 + num_firms: budget, capital, production alpha, one-hot firm identity
GovernmentNo additional dimensions beyond global state

Directory structure

ai_economist/real_business_cycle/
├── train_single_exp.py      # Train a single experiment configuration
├── train_multi_exps.py      # Hyperparameter sweep over multiple experiments
├── train_bestresponse.py    # Approximate best-response training against checkpointed policies
├── experiment_utils.py      # Shared training utilities
└── rbc/
    ├── constants.py         # Configuration dictionaries and action discretizations
    ├── cuda_manager.py      # CUDA environment wrapper and data structure management
    ├── networks.py          # PyTorch policy network definitions
    ├── util.py              # Miscellaneous utilities
    └── cuda/                # CUDA C simulation kernels

Dependencies

pip install torch>=1.9.0 pycuda==2021.1 matplotlib==3.2.1

Running experiments

1

Single experiment

Train one configuration defined in constants.py:
python train_single_exp.py
2

Hyperparameter sweep

Launch a Cartesian-product sweep over the parameter grids defined in train_multi_exps.py:
python train_multi_exps.py
Parameter sweeps are defined as *_param_sweeps dictionaries in the file. Each hyperparameter takes a list of one or more values; all combinations are run.
3

Approximate best-response training

Train one agent type against fixed checkpointed policies from a previous run:
python train_bestresponse.py ROLLOUT_DIR NUM_EPISODES_TO_TRAIN \
    --ep-strs ep1 ep2 \
    --agent-type all
--ep-strs specifies which saved episode checkpoints to load (e.g. policies saved at episodes 0, 10000, and 200000). --agent-type can target a single agent type or all.

Configuration dictionaries

Experiment configuration is managed through Python dictionaries with three top-level keys:
from ai_economist.real_business_cycle.rbc.constants import all_agents_export_experiment_template

# Build a config dict for a simulation with 3 firms, 100 consumers, 1 government
cfg = all_agents_export_experiment_template(
    NUMFIRMS=3,
    NUMCONSUMERS=100,
    NUMGOVERNMENTS=1,
    episodes_const=30000,
)

# cfg contains:
# cfg["agents"]  -- num_consumers, num_firms, action discretizations, state dims
# cfg["world"]   -- episode length, batch size, simulation parameters
# cfg["train"]   -- learning rate, save frequency, annealing schedules
The configuration is written to hparams.yaml in the job output directory at the start of training.

Action discretizations

Actions are discrete indices mapped to real-valued choices at runtime:
AgentAction headsExample choices
ConsumerConsumption per firm, work hoursWork: {0, 260, 520, 780, 1040} hours
FirmPrice, wage, capital investmentPrices: {0, 500, 1000, 1500, 2000, 2500}
GovernmentIncome tax rate, corporate tax rateEach: {0.0, 0.2, 0.4, 0.6, 0.8, 1.0}
Action arrays are saved to action_arrays.pickle in each run directory so that index-to-value mappings are preserved alongside saved policies.

Output files

Each training run produces a rollout-XXXXXX-XXXXX/ directory:
rollout-999999-99999/
├── hparams.yaml                      # Hyperparameters for this run
├── action_arrays.pickle              # Action index → value mapping
├── episode_XXXX_consumer.npz         # Dense rollout arrays for consumers
├── episode_XXXX_firm.npz             # Dense rollout arrays for firms
├── episode_XXXX_government.npz       # Dense rollout arrays for government
├── saved_models/
│   ├── consumer_policy_XXX.pt        # PyTorch state dict
│   ├── firm_policy_XXX.pt
│   └── government_policy_XXX.pt
├── brconsumer/                       # Best-response results for consumers
├── brfirm/                           # Best-response results for firms
└── brgovernment/                     # Best-response results for government
Each .npz rollout file has keys states, actions, rewards, action_array, and aux_array. Array shapes:
ArrayShape
states(batch_size, ep_length, num_agents, state_dim)
actions(batch_size, ep_length, num_agents) or (batch_size, ep_length, num_agents, num_action_heads) for consumers
rewards(batch_size, ep_length, num_agents)

Saving frequency

The default configuration saves dense rollouts frequently. To reduce disk usage, set train.save_dense_every in the configuration dictionary to a larger integer before starting the run.

Reference

For details on the model and results, see:
Finding General Equilibria in Many-Agent Economic Simulations using Deep Reinforcement Learning
(ArXiv link forthcoming as of this writing)