This code is modified from MADDPG and M3DDPG.
This is the code for implementing the Robust Multi-Agent Actor-Critic (RMAAC) algorithm presented in the paper: Robust Multi-Agent Reinforcement Learning with State Uncertainty. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE).
- Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)
You can use the following commands to configure the environment.
conda create -n rmaac_env python=3.5.4
conda activate rmaac_env
conda install numpy=1.14.5
# conda install -c anaconda tensorflow-gpu
conda install tensorflow
# conda install gym=0.10.5
pip install gym==0.10.5
We demonstrate here how the code can be used in conjunction with the Multi-Agent Particle Environments (MPE).
-
Download and install the MPE code here by following the
README. -
Ensure that
multiagent-particle-envshas been added to yourPYTHONPATH(e.g. in~/.bashrcor~/.bash_profile). -
To run the code,
cdinto theexperimentsdirectory and runtrain.py:
python train.py --scenario simple
- You can replace
simplewith any environment in the MPE you'd like to run.
-
--scenario: defines which environment in the MPE is to be used (default:"simple") -
--max-episode-lenmaximum length of each episode for the environment (default:25) -
--num-episodestotal number of training episodes (default:60000) -
--num-adversariesnumber of adversaries in the game (default:0)
-
--lr: learning rate for agents (default:1e-2) -
--lr-adv: learning rate for state perturbation adversaries(default:1e-2) -
--gamma: discount factor (default:0.95) -
--batch-size: batch size (default:1024) -
--num-units: number of units in the MLP (default:64) -
--noise-type: noise format (default:Linear) -
--noise-variance: variance of gaussian noise (default:1) -
--constraint-epsilon: the constraint parameter (default:0.5)
-
--exp-name: name of the experiment, used as the file name to save all results (default:None) -
--save-dir: directory where intermediate training results and model will be saved (default:"/tmp/policy/") -
--save-rate: model is saved every time this number of episodes has been completed (default:1000) -
--load-dir: directory where training state and model are loaded from (default:"")
-
--restore: restores previous training state stored inload-dir(or insave-dirif noload-dirhas been provided), and continues training (default:False) -
--display: displays to the screen the trained policy stored inload-dir(or insave-dirif noload-dirhas been provided), but does not continue training (default:False) -
--benchmark: runs benchmarking evaluations on saved policy, saves results tobenchmark-dirfolder (default:False) -
--benchmark-iters: number of iterations to run benchmarking for (default:100000) -
--benchmark-dir: directory where benchmarking data is saved (default:"./benchmark_files/") -
--plots-dir: directory where training curves are saved (default:"./learning_curves/")
If you used this code for your experiments or found it helpful, consider citing the following paper:
@article{
he2023robust,
title={Robust Multi-Agent Reinforcement Learning with State Uncertainty},
author={Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, and Fei Miao},
journal={Transactions on Machine Learning Research},
year={2023},
url={https://openreview.net/forum?id=CqTkapZ6H9}
}