Foundation environments support two training frameworks: RLlib for distributed CPU-based training and WarpDrive for massively parallel GPU-accelerated training. Both frameworks work with the same Foundation environment APIs and the same hierarchical agent setup.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/salesforce/ai-economist/llms.txt
Use this file to discover all available pages before exploring further.
Hierarchical agent setup
Foundation uses a two-level multi-agent structure:- Workers (agents
"0"through"n-1") — mobile economic actors that gather resources, trade, and build. They optimize post-tax utility. - Social planner (agent
"p") — a government-like agent that sets tax rates or policy interventions. It optimizes a social welfare objective.
Action modes
Each agent type can operate in one of two action modes, controlled by the environment configuration:| Parameter | Type | Description |
|---|---|---|
multi_action_mode_agents | bool | Whether mobile agents use multi-action mode. When True, each action subspace is sampled independently (MultiDiscrete). When False, a single flattened action is used (Discrete). |
multi_action_mode_planner | bool | Same as above for the planner agent. |
Curriculum learning
Training is stabilized using a two-phase curriculum approach, as described in The AI Economist paper:Phase one — agents only, no taxes
Train only the worker agents in a free market (taxes disabled via
disable_taxes: true on the PeriodicBracketTax component). Labor costs are annealed from zero using the energy_warmup_constant and energy_warmup_method parameters so that agents learn to explore before facing full costs.Phase two — agents and planner, with taxes
Resume from the phase-one agent checkpoint and begin training the planner. Tax rates are annealed via
tax_annealing_schedule. High planner entropy regularization at the start (via entropy_coeff_schedule) exposes agents to a wide range of tax levels before the planner begins to optimize.Training configurations
Configuration files drive all aspects of training: environment setup, trainer hyperparameters, and policy network architecture. Both backends use YAML configs.Choose a training framework
RLlib
Distributed multi-agent RL on CPU clusters using Ray. Supports the Gather-Trade-Build scenario and two-level curriculum learning. Recommended when GPU hardware is unavailable or when running large distributed rollouts.
WarpDrive (GPU)
Massively parallel GPU-accelerated training using CUDA. Runs many environment copies simultaneously on a single GPU. Used for the COVID-19 and economic simulation at scale.