GPU Training with WarpDrive - AI Economist / Foundation

WarpDrive is a GPU-accelerated multi-agent reinforcement learning framework that runs entire training pipelines — environment simulation, observation collection, and policy updates — directly on the GPU. For Foundation, this means running tens or hundreds of environment copies in parallel on a single device, dramatically reducing wall-clock training time compared to CPU-based approaches.

Why WarpDrive

CPU-based frameworks like RLlib spend significant time transferring data between Python and rollout workers. WarpDrive eliminates this bottleneck by:

Executing environment step logic in CUDA C kernels, one environment per CUDA block
Keeping all state (observations, actions, rewards) on GPU memory across rollouts
Only performing the first episode reset on the CPU; all subsequent resets run on-device

For the COVID-19 and economy simulation with 51 agents and 540-step episodes, this enables running 60 parallel environments efficiently.

Requirements

A CUDA-capable GPU
Python 3.7+
PyTorch

Installation

pip install rl-warp-drive
pip install torch
pip install ai-economist

The training script will raise an error at startup if no GPU is detected. WarpDrive training cannot fall back to CPU.

assert num_gpus_available > 0, "This training script needs a GPU to run!"

How `FoundationEnvWrapper` supports GPU mode

The FoundationEnvWrapper class in ai_economist/foundation/env_wrapper.py handles both CPU and GPU execution paths. Pass use_cuda=True to activate GPU mode.

Initialization

When use_cuda=True, the wrapper verifies that the underlying environment exposes the required CUDA manager attributes and then sets up two core WarpDrive objects:

from warp_drive.managers.data_manager import CUDADataManager
from warp_drive.managers.function_manager import (
    CUDAEnvironmentReset,
    CUDAFunctionManager,
)

# CUDADataManager holds all per-environment state tensors on the GPU
self.cuda_data_manager = CUDADataManager(
    num_agents=self.n_agents,
    episode_length=self.episode_length,
    num_envs=self.n_envs,
)

# CUDAFunctionManager compiles and loads CUDA kernels
self.cuda_function_manager = CUDAFunctionManager(
    num_agents=int(self.cuda_data_manager.meta_info("n_agents")),
    num_envs=int(self.cuda_data_manager.meta_info("n_envs")),
    process_id=process_id,
)
self.cuda_function_manager.compile_and_load_cuda(
    env_name=self.name,
    template_header_file="template_env_config.h",
    template_runner_file="template_env_runner.cu",
    customized_env_registrar=env_registrar,
    event_messenger=event_messenger,
)

CUDA function registration

After compilation, the wrapper registers named CUDA kernel functions for the scenario step, each component step, and reward computation:

# Scenario step
step_function = f"Cuda{self.name}Step"
self.cuda_function_manager.initialize_functions([step_function])
self.env.cuda_step = self.cuda_function_manager.get_function(step_function)

# Per-component steps
for component in self.env.components:
    self.cuda_function_manager.initialize_functions(
        ["Cuda" + component.name + "Step"]
    )
    self.env.world.cuda_component_step[component.name] = \
        self.cuda_function_manager.get_function("Cuda" + component.name + "Step")

# Reward computation
self.cuda_function_manager.initialize_functions(["CudaComputeReward"])
self.env.cuda_compute_reward = self.cuda_function_manager.get_function(
    "CudaComputeReward"
)

Reset strategy

The first reset() always runs on the CPU, copies state to GPU, and sets reset_on_host = False. All subsequent resets run on the GPU via CUDAEnvironmentReset.

# Flag to determine where the reset happens (host or device)
# First reset is always on the host (CPU), and subsequent resets are on
# the device (GPU)
self.reset_on_host = True

CUDA C implementations

Foundation provides CUDA C kernels that mirror the Python simulation logic. Each CUDA block corresponds to one parallel environment; each thread corresponds to one agent.

File	Description
`ai_economist/foundation/components/covid19_components_step.cu`	CUDA kernels for the `ControlUSStateOpenCloseStatus`, `FederalGovernmentSubsidy`, and `VaccinationCampaign` component steps
`ai_economist/foundation/scenarios/covid19/covid19_env_step.cu`	CUDA kernel for the full COVID-19 environment step, including the SIR epidemic model
`ai_economist/foundation/scenarios/covid19/covid19_build.cu`	Build entry point — includes the component and env step files for compilation

Example: component step kernel signature

Each component kernel receives all relevant state arrays and constants. Environment ID is read from blockIdx.x; agent ID from threadIdx.x:

extern "C" {
    __global__ void CudaControlUSStateOpenCloseStatusStep(
        int * stringency_level,
        const int kActionCooldownPeriod,
        int * action_in_cooldown_until,
        const int * kDefaultAgentActionMask,
        const int * kNoOpAgentActionMask,
        const int kNumStringencyLevels,
        int * actions,
        float * obs_a_stringency_policy_indicators,
        float * obs_a_action_mask,
        float * obs_p_stringency_policy_indicators,
        int * env_timestep_arr,
        const int kNumAgents,
        const int kEpisodeLength
    ) {
        const int kEnvId = blockIdx.x;   // one block = one environment
        const int kAgentId = threadIdx.x; // one thread = one agent
        ...
    }
}

Example: SIR epidemic model device function

The env step kernel calls device functions such as cuda_sir_step to simulate disease dynamics per agent per environment:

__device__ void cuda_sir_step(
    float* susceptible,
    float* infected,
    float* recovered,
    float* vaccinated,
    float* deaths,
    int* num_vaccines_available_t,
    ...
    int* stringency_level,
    float* beta,
    const float kGamma,
    const float kDeathRate,
    const int kEnvId,
    const int kAgentId,
    int timestep,
    const int kEpisodeLength,
    ...
) { ... }

Running GPU training

Register the CUDA source path for your environment, then construct FoundationEnvWrapper with use_cuda=True and pass it to WarpDrive’s Trainer.

import yaml
from warp_drive.training.trainer import Trainer
from warp_drive.utils.env_registrar import EnvironmentRegistrar
from ai_economist.foundation.env_wrapper import FoundationEnvWrapper
from ai_economist.foundation.scenarios.covid19.covid19_env import (
    CovidAndEconomyEnvironment,
)

with open("run_configs/covid_and_economy_environment.yaml", "r") as f:
    run_config = yaml.safe_load(f)

num_envs = run_config["trainer"]["num_envs"]

# Register the CUDA source so WarpDrive can compile it
env_registrar = EnvironmentRegistrar()
env_registrar.add_cuda_env_src_path(
    CovidAndEconomyEnvironment.name,
    "ai_economist/foundation/scenarios/covid19/covid19_build.cu",
)

# Create the GPU-enabled wrapper
env_wrapper = FoundationEnvWrapper(
    CovidAndEconomyEnvironment(**run_config["env"]),
    num_envs=num_envs,
    use_cuda=True,
    env_registrar=env_registrar,
)

# Map policy names to agent IDs
policy_tag_to_agent_id_map = {
    "a": [str(agent_id) for agent_id in range(env_wrapper.env.n_agents)],
    "p": ["p"],
}

# Agents and planner have different obs/action spaces,
# so create separate placeholders per policy
trainer = Trainer(
    env_wrapper=env_wrapper,
    config=run_config,
    policy_tag_to_agent_id_map=policy_tag_to_agent_id_map,
    create_separate_placeholders_for_each_policy=True,
    obs_dim_corresponding_to_num_agents="last",
)

trainer.train()
trainer.graceful_close()

Then launch training:

python ai_economist/training/training_script.py --env covid_and_economy_environment

The obs_dim_corresponding_to_num_agents="last" argument tells WarpDrive that the agent dimension is the last axis of the observation tensor. WarpDrive’s default assumption is that it is the first axis.

Run configuration reference

The WarpDrive training config at ai_economist/training/run_configs/covid_and_economy_environment.yaml controls the key training parameters:

trainer:
    num_envs: 60          # number of parallel environments on the GPU
    num_episodes: 1000    # total episodes to train for
    train_batch_size: 5400

saving:
    metrics_log_freq: 100       # print metrics every N iterations
    model_params_save_freq: 500 # save model weights every N iterations
    basedir: "/tmp"
    name: "covid19_and_economy"
    tag: "experiments"

The full interactive tutorial is available on Colab: multi_agent_gpu_training_with_warp_drive.ipynb

Documentation Index

​Why WarpDrive

​Requirements

​Installation

​How FoundationEnvWrapper supports GPU mode

​Initialization

​CUDA function registration

​Reset strategy

​CUDA C implementations

​Example: component step kernel signature

​Example: SIR epidemic model device function

​Running GPU training

​Run configuration reference

Why WarpDrive

Requirements

Installation

How `FoundationEnvWrapper` supports GPU mode

Initialization

CUDA function registration

Reset strategy

CUDA C implementations

Example: component step kernel signature

Example: SIR epidemic model device function

Running GPU training

Run configuration reference