Official code to reproduce the experiments for "Adapting Video Diffusion Models to World Models" (AVID), which proposes to adapt pretrained video diffusion models to action-conditioned world models.
AVID is implemented using both pixel-space diffusion and latent space diffusion. For instructions on how to use each of the codebases, please see pixel_diffusion/README.md and latent_diffusion/README.md. Results are logged to Weights and Biases.
This project utilises code from video-diffusion-pytorch, DynamiCrafter, and octo.











