Skip to content

Panda-Shawn/DYNAIL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamics Adapted Imitation Learning [TMLR]

This repo contains code accompaning the paper, Dynamics Adapted Imitation Learning. DYNAIL is an adversarial imitation learning method dealing with dynamics shift between expert demonstrations and environment.

Dependencies

The code is based on Imitation v0.3.0, which is a library with clean implementations of imitation and reward learning algorithms.

Install Imitation v0.3.0

git clone --depth 1 --branch v0.3.0 https://github.com/HumanCompatibleAI/imitation.git
cd imitation
pip install -e .

Modifications

Then imitation/examples, imitation/src/imitation/algorithms/adversarial and imitation/src/imitation/util can be replaced with the folders provided in this repo. What's more, we use mujoco-py v2.1.2.14 instead of mujoco-py v1.5 for all experiments in the paper.

Usage

For example, we train DYNAIL in BrokenHumanoid-v3 with expert demonstrations from Humanoid-v3.

  1. Training expert policy.

    We train an expert policy with SAC in Humanoid-v3.

    cd example
    python train_human_sac.py
    

    Then the expert policy is saved in ./output/SAC/Humanoid-v3/final/model.zip.

  2. Generating demonstrations.

    We use the expert policy to generate 60 rollouts in Humanoid-v3 and choose the top 40 ones as expert demonstrations.

    python expert_policy.py
    

    Then the demonstration are collected in ./expert/Humanoid-v3/rollout.pkl along with reward information in ./expert/Humanoid-v3/rollout_info.txt.

  3. Training imitation method.

    We train DYNAIL in BrokenHumanoid-v3 with the 40 demonstrations.

    python train_brohuman_dynail_sac.py
    

    For comparision, we can also train GAIFO as a baseline.

    python train_brohuman_gaifo_sac.py
    

    Then, the corresponding results are saved in ./output/DYNAIL/BrokenHumanoid-v3 and ./output/GAIL/BrokenHumanoid-v3, respectively.

Other tasks can also be implemented following the instructions above.

Citation

@article{
liu2023dynamics,
title={Dynamics Adapted Imitation Learning},
author={Zixuan Liu and Liu Liu and Bingzhe Wu and Lanqing Li and Xueqian Wang and Bo Yuan and Peilin Zhao},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=w36pqfaJ4t},
note={}
}

Results

Experiments with realworldrl-suite

Source Domain: Quadruped


source expert in source domain

Target Domain: Quadruped with Low Friction

Direct Transfer

source expert and behavior cloning in target domain

Both of quadruped robots overturn on the groud with low friction.

Our Method: DYNAIL

DYNAIL in target domain

Our method succeeds in quadruped task with low friction.

Experiments with High-Dimensional Environment Humanoid

Source Domain: Humanoid-v3


source expert in source domain

Target Domain: BrokenHumanoid-v3 (Humanoid-v3 with red broken abdomen joint)

Direct Transfer

source expert and behavior cloning in target domain
Both of the episodes above terminate because of unhealthy conditions.
Our Method: DYNAIL

DYNAIL in target domain
Our method succeeds in humanoid task with red broken abdomen.

Experiments with Maze (breaking assumptions)

Source Domain: UMaze-v0


source expert in source domain

Target Domain: IMaze-v0 (Moving the middle wall block to the right)

Direct Transfer

behavior cloning and GWIL in target domain
Both of the baselines fail to find the way to the goal.
Our Method: DYNAIL

DYNAIL in target domain
Our method succeeds in maze task with a moving wall block.

About

DYNAIL: Dynamics Adapted Imitation Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages