The Real Business Cycle (RBC) simulation implements a many-agent macroeconomic environment with heterogeneous consumers, firms, and a government. The simulation core is written in CUDA C for high-throughput GPU execution, and training is powered by PyTorch policy networks.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/salesforce/ai-economist/llms.txt
Use this file to discover all available pages before exploring further.
What the simulation models
The RBC simulation extends classical real business cycle theory to a multi-agent setting with strategic interactions:- Consumers choose how much to work and how much of each firm’s good to consume, subject to a budget constraint. Each consumer has a private
thetaparameter (disutility of work). - Firms set prices, wages, and capital investment. Production follows a Cobb-Douglas function parameterized by a firm-specific
alpha. - Government sets income and corporate tax rates to influence economic outcomes.
Global state layout
The global state has dimension4 * num_firms + 2 * num_governments + 1:
| Dimension | Content |
|---|---|
num_firms | Current prices (one per firm) |
num_firms | Current wages (one per firm) |
num_firms | Inventory / stock levels (one per firm) |
num_firms | Over-demanded flag (one per firm) |
2 * num_governments | Income and corporate tax rates |
1 | Time index |
| Agent type | Additional dimensions |
|---|---|
| Consumer | + 2: budget, theta (disutility of work) |
| Firm | + 3 + num_firms: budget, capital, production alpha, one-hot firm identity |
| Government | No additional dimensions beyond global state |
Directory structure
Dependencies
Running experiments
Hyperparameter sweep
Launch a Cartesian-product sweep over the parameter grids defined in Parameter sweeps are defined as
train_multi_exps.py:*_param_sweeps dictionaries in the file. Each hyperparameter takes a list of one or more values; all combinations are run.Configuration dictionaries
Experiment configuration is managed through Python dictionaries with three top-level keys:hparams.yaml in the job output directory at the start of training.
Action discretizations
Actions are discrete indices mapped to real-valued choices at runtime:| Agent | Action heads | Example choices |
|---|---|---|
| Consumer | Consumption per firm, work hours | Work: {0, 260, 520, 780, 1040} hours |
| Firm | Price, wage, capital investment | Prices: {0, 500, 1000, 1500, 2000, 2500} |
| Government | Income tax rate, corporate tax rate | Each: {0.0, 0.2, 0.4, 0.6, 0.8, 1.0} |
action_arrays.pickle in each run directory so that index-to-value mappings are preserved alongside saved policies.
Output files
Each training run produces arollout-XXXXXX-XXXXX/ directory:
.npz rollout file has keys states, actions, rewards, action_array, and aux_array. Array shapes:
| Array | Shape |
|---|---|
states | (batch_size, ep_length, num_agents, state_dim) |
actions | (batch_size, ep_length, num_agents) or (batch_size, ep_length, num_agents, num_action_heads) for consumers |
rewards | (batch_size, ep_length, num_agents) |
Saving frequency
The default configuration saves dense rollouts frequently. To reduce disk usage, settrain.save_dense_every in the configuration dictionary to a larger integer before starting the run.
Reference
For details on the model and results, see:Finding General Equilibria in Many-Agent Economic Simulations using Deep Reinforcement Learning
(ArXiv link forthcoming as of this writing)