Dynamics Adapted Imitation Learning [TMLR]

This repo contains code accompaning the paper, Dynamics Adapted Imitation Learning. DYNAIL is an adversarial imitation learning method dealing with dynamics shift between expert demonstrations and environment.

Dependencies

The code is based on Imitation v0.3.0, which is a library with clean implementations of imitation and reward learning algorithms.

Install Imitation v0.3.0

git clone --depth 1 --branch v0.3.0 https://github.com/HumanCompatibleAI/imitation.git
cd imitation
pip install -e .

Modifications

Then imitation/examples, imitation/src/imitation/algorithms/adversarial and imitation/src/imitation/util can be replaced with the folders provided in this repo. What's more, we use mujoco-py v2.1.2.14 instead of mujoco-py v1.5 for all experiments in the paper.

Usage

For example, we train DYNAIL in BrokenHumanoid-v3 with expert demonstrations from Humanoid-v3.

Training expert policy.

We train an expert policy with SAC in Humanoid-v3.
```
cd example
python train_human_sac.py
```
Then the expert policy is saved in ./output/SAC/Humanoid-v3/final/model.zip.
Generating demonstrations.

We use the expert policy to generate 60 rollouts in Humanoid-v3 and choose the top 40 ones as expert demonstrations.
```
python expert_policy.py
```
Then the demonstration are collected in ./expert/Humanoid-v3/rollout.pkl along with reward information in ./expert/Humanoid-v3/rollout_info.txt.
Training imitation method.

We train DYNAIL in BrokenHumanoid-v3 with the 40 demonstrations.
```
python train_brohuman_dynail_sac.py
```
For comparision, we can also train GAIFO as a baseline.
```
python train_brohuman_gaifo_sac.py
```
Then, the corresponding results are saved in ./output/DYNAIL/BrokenHumanoid-v3 and ./output/GAIL/BrokenHumanoid-v3, respectively.

Other tasks can also be implemented following the instructions above.

Citation

@article{
liu2023dynamics,
title={Dynamics Adapted Imitation Learning},
author={Zixuan Liu and Liu Liu and Bingzhe Wu and Lanqing Li and Xueqian Wang and Bo Yuan and Peilin Zhao},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=w36pqfaJ4t},
note={}
}

Results

Experiments with realworldrl-suite

Source Domain: Quadruped

source expert in source domain

Target Domain: Quadruped with Low Friction

Direct Transfer

source expert and behavior cloning in target domain

Both of quadruped robots overturn on the groud with low friction.

Our Method: DYNAIL

DYNAIL in target domain

Our method succeeds in quadruped task with low friction.

Experiments with High-Dimensional Environment Humanoid

Source Domain: Humanoid-v3

source expert in source domain

Target Domain: BrokenHumanoid-v3 (Humanoid-v3 with red broken abdomen joint)

Direct Transfer

source expert and behavior cloning in target domain

Both of the episodes above terminate because of unhealthy conditions.

Our Method: DYNAIL

DYNAIL in target domain

Our method succeeds in humanoid task with red broken abdomen.

Experiments with Maze (breaking assumptions)

Source Domain: UMaze-v0

source expert in source domain

Target Domain: IMaze-v0 (Moving the middle wall block to the right)

Direct Transfer

behavior cloning and GWIL in target domain

Both of the baselines fail to find the way to the goal.

Our Method: DYNAIL

DYNAIL in target domain

Our method succeeds in maze task with a moving wall block.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
adversarial		adversarial
examples		examples
media		media
util		util
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamics Adapted Imitation Learning [TMLR]

Dependencies

Install Imitation v0.3.0

Modifications

Usage

Citation

Results

Experiments with realworldrl-suite

Source Domain: Quadruped

Target Domain: Quadruped with Low Friction

Direct Transfer

Our Method: DYNAIL

Experiments with High-Dimensional Environment Humanoid

Source Domain: Humanoid-v3

Target Domain: BrokenHumanoid-v3 (Humanoid-v3 with red broken abdomen joint)

Direct Transfer

Our Method: DYNAIL

Experiments with Maze (breaking assumptions)

Source Domain: UMaze-v0

Target Domain: IMaze-v0 (Moving the middle wall block to the right)

Direct Transfer

Our Method: DYNAIL

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Panda-Shawn/DYNAIL

Folders and files

Latest commit

History

Repository files navigation

Dynamics Adapted Imitation Learning [TMLR]

Dependencies

Install Imitation v0.3.0

Modifications

Usage

Citation

Results

Experiments with realworldrl-suite

Source Domain: Quadruped

Target Domain: Quadruped with Low Friction

Direct Transfer

Our Method: DYNAIL

Experiments with High-Dimensional Environment Humanoid

Source Domain: Humanoid-v3

Target Domain: BrokenHumanoid-v3 (Humanoid-v3 with red broken abdomen joint)

Direct Transfer

Our Method: DYNAIL

Experiments with Maze (breaking assumptions)

Source Domain: UMaze-v0

Target Domain: IMaze-v0 (Moving the middle wall block to the right)

Direct Transfer

Our Method: DYNAIL

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages