This repo contains code accompaning the paper, Dynamics Adapted Imitation Learning. DYNAIL is an adversarial imitation learning method dealing with dynamics shift between expert demonstrations and environment.
The code is based on Imitation v0.3.0, which is a library with clean implementations of imitation and reward learning algorithms.
git clone --depth 1 --branch v0.3.0 https://github.com/HumanCompatibleAI/imitation.git
cd imitation
pip install -e .
Then imitation/examples, imitation/src/imitation/algorithms/adversarial and imitation/src/imitation/util can be replaced with the folders provided in this repo. What's more, we use mujoco-py v2.1.2.14 instead of mujoco-py v1.5 for all experiments in the paper.
For example, we train DYNAIL in BrokenHumanoid-v3 with expert demonstrations from Humanoid-v3.
-
Training expert policy.
We train an expert policy with SAC in Humanoid-v3.
cd example python train_human_sac.pyThen the expert policy is saved in
./output/SAC/Humanoid-v3/final/model.zip. -
Generating demonstrations.
We use the expert policy to generate 60 rollouts in Humanoid-v3 and choose the top 40 ones as expert demonstrations.
python expert_policy.pyThen the demonstration are collected in
./expert/Humanoid-v3/rollout.pklalong with reward information in./expert/Humanoid-v3/rollout_info.txt. -
Training imitation method.
We train DYNAIL in BrokenHumanoid-v3 with the 40 demonstrations.
python train_brohuman_dynail_sac.pyFor comparision, we can also train GAIFO as a baseline.
python train_brohuman_gaifo_sac.pyThen, the corresponding results are saved in
./output/DYNAIL/BrokenHumanoid-v3and./output/GAIL/BrokenHumanoid-v3, respectively.
Other tasks can also be implemented following the instructions above.
@article{
liu2023dynamics,
title={Dynamics Adapted Imitation Learning},
author={Zixuan Liu and Liu Liu and Bingzhe Wu and Lanqing Li and Xueqian Wang and Bo Yuan and Peilin Zhao},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=w36pqfaJ4t},
note={}
}
Experiments with realworldrl-suite
Both of quadruped robots overturn on the groud with low friction.
Our method succeeds in quadruped task with low friction.
Both of the episodes above terminate because of unhealthy conditions. Our method succeeds in humanoid task with red broken abdomen. Both of the baselines fail to find the way to the goal. Our method succeeds in maze task with a moving wall block.










