Skip to content

burchim/MuDreamer

Repository files navigation

MuDreamer: Learning Predictive World Models without Reconstruction

This is the official repository of MuDreamer.

Read MuDreamer paper on Arxiv

Installation

Clone GitHub repository and set up environment

git clone https://github.com/burchim/MuDreamer && cd MuDreamer
./install.sh

Download Kinetics400 videos

python3 download_videos.py

Download DAVIS videos (Distracting Control Suite)

wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip && unzip DAVIS-2017-trainval-480p.zip

Training

Run an experiment:

env_name=dmc-walker-run run_name=dmc python3 main.py

The training task can be selected with the env_name=dmc-domain-task variable. Training logs, replay buffer and checkpoints will be saved to callbacks/MuDreamer/run_name/env_name.

Replace 'dmc' with 'dis' to experiment with the Distracting Control Suite:

env_name=dis-walker-run run_name=dis python3 main.py

Override hyperparameters

Overriding model config hyperparameters (set apply_random_background to false):

env_name=dmc-walker-run run_name=dmc_bg_off override_config='{"loss_reward_scale": 1.0, "train_env_params": {"apply_random_background": false}, "eval_env_params": {"apply_random_background": false}}' python3 main.py

Visualize experiments

tensorboard --logdir ./callbacks

Evaluation

'--mode evaluation' can be used to evaluate agents. The '--load_last' flag will scan the log directory to load the last checkpoint. '--checkpoint' can also be used to load a specific '.ckpt' checkpoint file.

env_name=dmc-walker-run run_name=dmc python3 main.py --load_last --mode evaluation

Script options

# Args
-c / --config_file              type=str   default="configs/mu_dreamer.py"      help="Python configuration file containing model hyperparameters"
-m / --mode                     type=str   default="training"                   help="Mode: training, evaluation, pass"
-i / --checkpoint               type=str   default=None                         help="Load model from checkpoint name"
--cpu                           action="store_true"                             help="Load model on cpu"
--load_last                     action="store_true"                             help="Load last model checkpoint"
--wandb                         action="store_true"                             help="Initialize wandb logging"
--verbose_progress_bar          type=int   default=1                            help="Verbose level of progress bar display"

# Training
--saving_period_epoch           type=int   default=1                            help="Model saving every 'n' epochs"
--log_figure_period_step        type=int   default=None                         help="Log figure every 'n' steps"
--log_figure_period_epoch       type=int   default=1                            help="Log figure every 'n' epochs"
--step_log_period               type=int   default=100                          help="Training step log period"
--keep_last_k                   type=int   default=3                            help="Keep last k checkpoints"

# Eval
--eval_period_epoch             type=int   default=5                            help="Model evaluation every 'n' epochs"
--eval_period_step              type=int   default=None                         help="Model evaluation every 'n' steps"

# Info
--show_dict                     action="store_true"                             help="Show model dict summary"
--show_modules                  action="store_true"                             help="Show model named modules"
    
# Debug
--detect_anomaly                action="store_true"                             help="Enable or disable the autograd anomaly detection"

Citation

If this code or paper is helpful in your research, please use the following citation:

@article{burchi2024mudreamer,
  title={Mudreamer: Learning predictive world models without reconstruction},
  author={Burchi, Maxime and Timofte, Radu},
  journal={arXiv preprint arXiv:2405.15083},
  year={2024}
}

Acknowledgments

Official DreamerV3 Implementation: https://github.com/danijar/dreamerv3
Official DreamerPro Implementation https://github.com/fdeng18/dreamer-pro
Distracting Control Suite: https://github.com/google-research/google-research/tree/master/distracting_control

About

MuDreamer: Learning Predictive World Models without Reconstruction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors