This repo contains our work in AAAI 2022, Efficient Continuous Control with Double Actors and Regularized Critics.
DDPG is the fine-tuned version of vanilla DDPG which could achieve much better performance than DDPG at various environments. TD3 use fine-tuned DDPG as the baselines, and so we do in our work. The implementation of DARC is based on the open-source TD3 codebase.
We use main.py to run results where DDPG.py along with TD3.py are served as baselines and DARC.py is the core file for our work. We use seeds 1-5 for all algorithms during training and different seeds (the current seed + 100) during evaluation (see run.sh for more details).
One need to set the sampled state as the initial state in MuJoCo to evaluate the true value. Please refer to openai/gym#1617 for details.
- python: 3.7.9
- mujoco_py: 2.0.2.13
- torch: 1.8.0
- gym: 0.18.0
- box2d-py
- pybulletgym
Please refer to the open-source implementation of pybulletgym here
Before installing pybullet, make sure that you have gym installed. Then run the following commands to install pybulletgym.
git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .
import pybulletgymFor detailed environments in pybulletgym, please refer here.
Utilize GPUs to accelerate training if available
export CUDA_VISIBLE_DEVICES=1Run the following commands to reproduce results in the submission
./run.shpython main.py --env <environment_name> --save-model --policy DARC --dir ./logs/DARC/r1 --seed 1 --qweight 0.1 --reg 0.005python main.py --env <environment_name> --seed 1 --policy <algorithm_name> --dir './logs/' --save-modelIf you find our work helpful, please consider cite our work.
@inproceedings{Efficient2022Lyu,
title={Efficient Continuous Control with Double Actors and Regularized Critics},
author={Jiafei Lyu and Xiaoteng Ma and Jiangpeng Yan and Xiu Li},
booktitle={Thirty-sixth AAAI Conference on Artificial Intelligence},
year={2022},
}