This repository is the official implementation of DRIBO. Our implementation is based on CURL by Misha Laskin and SAC+AE by Denis Yarats.
-
Install dm_control with MuJoCo Pro 2.0;
pip install dm_control pip install git+git://github.com/denisyarats/dmc2gym.git -
All of the python dependencies are in the
setup.pyfile. They can be installed manually or with the following command:pip install -e . -
Running the natural video setting You can download the Kinetics dataset to replicate our setup.
- Grab the "arranging_flower" label from the train dataset to replace backgrounds during training. The videos are in folder
../kinetics-downloader/dataset/train/arranging_flowers.python download.py --classes 'arranging flowers' - Download the test dataset to replace backgrounds during testing. The videos are in folder
../kinetics-downloader/dataset/test.python download.py --test
-
To train a DRIBO agent on the
cartpole swinguptask underthe clean settingrun./script/run_clean_bg_cartpole_im84_dim1024_no_stacked_frames.shfrom the root of this directory. Therun_clean_bg_cartpole_im84_dim1024_no_stacked_frames.shfile contains the following command, which you can modify to try different environments / hyperparamters.CUDA_VISIBLE_DEVICES=0 python train.py \ --domain_name cartpole \ --task_name swingup \ --encoder_type rssm --work_dir ./clean_log \ --action_repeat 8 --num_eval_episodes 8 \ --pre_transform_image_size 100 --image_size 84 --kl_balance \ --agent DRIBO_sac --frame_stack 1 --encoder_feature_dim 1024 --save_model \ --seed 0 --critic_lr 1e-5 --actor_lr 1e-5 --eval_freq 10000 --batch_size 8 --num_train_steps 890000 -
To train a DRIBO agent on the
cartpole swinguptask underthe natural video settingrun./script/run_noisy_bg_cartpole_im84_dim1024_no_stacked_frames.shfrom the root of this directory. Therun_noisy_bg_cartpole_im84_dim1024_no_stacked_frames.shfile contains the following command, which you can modify to try different environments / hyperparamters.CUDA_VISIBLE_DEVICES=0 python train.py \ --domain_name cartpole \ --task_name swingup \ --encoder_type rssm --work_dir ./log \ --action_repeat 8 --num_eval_episodes 8 --kl_balance \ --pre_transform_image_size 100 --image_size 84 --noisy_bg \ --agent DRIBO_sac --frame_stack 1 --encoder_feature_dim 1024 --save_model \ --seed 0 --critic_lr 1e-5 --actor_lr 1e-5 --eval_freq 10000 --batch_size 8 --num_train_steps 890000The console output is available in a form:
| train | E: 1 | S: 1000 | D: 34.7 s | R: 0.0000 | BR: 0.0000 | A_LOSS: 0.0000 | CR_LOSS: 0.0000 | MIB_LOSS: 0.0000 | skl: 0.0000 | beta: 0.0E+00a training entry decodes as:
train - training episode E - total number of episodes S - total number of environment steps D - duration in seconds to train 1 episode R - episode reward BR - average reward of sampled batch A_LOSS - average loss of actor CR_LOSS - average loss of critic MIB_LOSS - average DRIBO loss skl - average value of symmetrized KL divergence beta - value of coefficient betawhile an evaluation entry:
| eval | S: 0 | ER: 22.1371which just tells the expected reward
ERevaluating current policy afterSsteps. Note thatERis average evaluation performance overnum_eval_episodesepisodes (usually 8).