Skip to content

thuml/Vid2World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Vid2World: Crafting Video Diffusion Models to Interactive World Models

arXiv Paper License: MIT ย 

This is the official code base for the paper Vid2World: Crafting Video Diffusion Models to Interactive World Models.

Give it a star ๐ŸŒŸ if you find our work useful!

Banner for Vid2World

๐Ÿ”ฅ News & Updates

  • ๐Ÿšฉ 2025-12: We release all model checkpoints on ๐Ÿค— Huggingface.

  • ๐Ÿšฉ 2025-12: We release code for training, inference and evaluation.

๐Ÿ“‹ TL;DR

We repurpose internet-scale pretrained video diffusion models into interactive world models:

  • โš™๏ธ Converts non-causal video diffusion backbones into autoregressive, temporally causal architectures with frame-level action conditioning.
  • ๐Ÿฆธ Enables high-fidelity, action-conditioned video simulation and scalable world model learning across robot manipulation, 3D game simulation, and open-world navigation.

๐Ÿš€ QuickStart

โš™๏ธ Environment Setup

Note

The code is tested on Ubuntu 20.04, 22.04 and AlmaLinux 9.5.

First create your conda environment:

conda create -n v2w python=3.8 -y
conda activate v2w

Then, install dependencies:

pip install -r requirements.txt

For training and evaluation:

  • Download the base video model (DynamiCrafter, 320 $\times$ 512), and save it into checkpoints/dynamicrafter_512_v1/model.ckpt.
  • Download the pretrained i3d model and save it into checkpoints/i3d/i3d_torchscript.pt.

At this point, your checkpoints folder should look like this:

checkpoints
โ”œโ”€โ”€ dynamicrafter_512_v1
โ”‚   โ””โ”€โ”€ model.ckpt
โ””โ”€โ”€ i3d
    โ””โ”€โ”€ i3d_torchscript.pt

๐Ÿค— Models

At the moment, we provide the following models:

File Domain Weight Transfer Method Action Guidance Training Steps
Vid2World-RT1 RT-1 Extrapolative โœ”๏ธ 100k
Vid2World-CSGO CSGO Extrapolative โœ”๏ธ 100k
Vid2World-RECON RECON Extrapolative โœ”๏ธ 100k
Vid2World-RT1-NAG RT-1 Extrapolative โŒ 30k
Vid2World-RT1-Masked-NAG RT-1 Masked โŒ 30k
Vid2World-RT1-30k RT-1 Extrapolative โœ”๏ธ 30k
Vid2World-RT1-Masked RT-1 Masked โœ”๏ธ 30k
Vid2World-RT1-Shift RT-1 Shift โœ”๏ธ 30k

Before inference, make sure you switch the |<your_pretrained_checkpoint>| in the config file to the path towards your local checkpoint.

๐Ÿ“ธ Showcases

๐Ÿค– Robot Manipulation ๐Ÿฆพ
all_combined.mp4
๐ŸŽฎ Game Simulation ๐Ÿ•น๏ธ
all_combined.1.mp4
๐Ÿ—บ๏ธ Open-World Navigation ๐Ÿงญ
all_combined.3.mp4

For more showcases, check out our Project Page.

๐Ÿค– Vid2World for Robot Manipulation

1. Prepare Data & Model

Data

To download and preprocess the used dataset:

  • Download the RT-1 Robot Action Dataset from OXE.
  • Run the following command in the repo to save the processed dataset to your desired local folder.
python lvdm/data/oxe_data_converter.py --dataset_name fractal20220817_data --input_path {path to downloaded OXE} --output_path {path to stored npz}

Model

For inference, download our corresponding pretrained model from ๐Ÿค—Huggingface, check out QuickStart.

2. Training

To launch training with the RT-1 dataset, go to configs/manipulation/config_rt1_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/manipulation/config_rt1_train.yaml --train  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

For ablation experiments, we provide the corresponding configurations in configs/ablation.

File Weight Transfer Method Action Guidance Model Checkpoint
config_rt1_*_masked_nag.yaml Masked โŒ ๐Ÿค—Vid2World-RT1-Masked-NAG
config_rt1_*_extrp_nag.yaml Extrapolative โŒ ๐Ÿค—Vid2World-RT1-NAG
config_rt1_*_shift.yaml Shift โœ”๏ธ ๐Ÿค—Vid2World-RT1-Shift
config_rt1_*_masked.yaml Masked โœ”๏ธ ๐Ÿค—Vid2World-RT1-Masked
config_rt1_*_all.yaml Extrapolative โœ”๏ธ ๐Ÿค—Vid2World-RT1-30k

3. Inference

Here we provide two setups, one is generating the sequence frame by frame, which is referred to as Auto-Regressive Generation, and one that generates the full sequence all in one go, which we refer to as Non-Auto-Regressive Generation.

Before running the experiments, make sure you download/train the corresponding checkpoints, as well as change the data paths in the config file used.

Auto-Regressive Generation

For auto-regressive generation, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base code_release_configs/manipulation/config_rt1_test_ar.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

While doing ablation, switch the configuration file to the corresponding file.

Non-Auto-Regressive Generation

For non-auto-regressive generation, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base code_release_configs/manipulation/config_rt1_test_nar.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

RT-1 Action Control Test

Test model's ability to respond to different world_vector actions (X+, X-, Y+, Y-, Z+, Z-).

First, update the config file configs/manipulation/config_rt1_action_control_test.yaml:

  • Set pretrained_checkpoint to your checkpoint path
  • Set data_dir to your RT-1 data directory

Then run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/manipulation/config_rt1_action_control_test.yaml --val --name rt1_action_control_test --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Results will be saved to the directory specified in the config file's save_dir parameter. Each batch visualizes 8 action variants side-by-side for comparison.

๐Ÿ•น๏ธ Vid2World for Game Simulation

1. Prepare Data & Model

Data

To download and preprocess data, please follow the steps from DIAMOND, specifically:

  • Download the .tar files in the dataset_dm_scraped_dust2_tars from this dataset repo.
  • Use the provided script to process the dataset for full and low res. For our purpose, we use only the full_res folder.

Model

For inference, download our corresponding pretrained model from ๐Ÿค—Huggingface, check out QuickStart.

2. Training

To launch training with the csgo dataset, go to configs/game/config_csgo_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_train.yaml --train  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

3. Inference

Standard Inference

For inference, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Long Rollout Inference on CSGO

For long rollout inference on CSGO, run:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Long Rollout Inference on OOD Games

For long rollout inference on previously unseen games (Valorant, Delta Force), run:

Valorant:

python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout_valorant.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 2 lightning.trainer.num_nodes=1

Delta Force:

python3 -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 --master_addr=127.0.0.1 --master_port=12879 --node_rank=0 ./main/trainer.py --base configs/game/config_csgo_test_long_rollout_delta_force.yaml --val  --name training_512_v1.0 --logdir |<your_log_dir>| --devices 2 lightning.trainer.num_nodes=1

๐Ÿ—บ๏ธ Vid2World for Open-World Navigation

1. Prepare Data & Model

Data

To download and preprocess data, please follow the steps from NoMaD, specifically:

  • Download the RECON dataset.
  • Change the preprocessing resolution to (640,480).
  • Run process_recon.py to save the processed dataset to your desired local folder.

Model

For inference, download our corresponding pretrained model from ๐Ÿค—Huggingface, check out QuickStart.

2. Training

To launch training with the RECON dataset, go to configs/navigation/config_recon_train.yaml and change the |<your_data_dir>| into the directory where your local data directory. To launch training on 1x4 GPU cards, use the following command:

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_train.yaml --train --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

3. Inference

Following NWM, we evaluate our performance under two setups: single-step generation and auto-regressive generation. While in both setups, our model is doing auto-regressive generation, the data split is different, we support both setups.

Single-Step Generation

Change the |<data_dir>| and |<path_to_pretrained_checkpoint>| in configs/navigation/config_recon_test_single_step.yaml.

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_test_single_step.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

Auto-Regressive Generation

Change the |<data_dir>| and |<path_to_pretrained_checkpoint>| in configs/navigation/config_recon_test_rollout.yaml.

python3 -m torch.distributed.launch --nproc_per_node=4 --nnodes=1 --master_addr=127.0.0.1 --master_port=12869 --node_rank=0 ./main/trainer.py --base configs/navigation/config_recon_test_rollout.yaml --val --name training_512_v1.0 --logdir |<your_log_dir>| --devices 4 lightning.trainer.num_nodes=1

๐Ÿงช Evaluation

Note

Check out this issue if you encounter the following error message: ImportError: cannot import name 'trunc_normal_' from 'utils' (unknown location)

For evaluation, after running the inference code, calculate the metrics by running:

python eval.py --exp_folder |<your_log_image_dir>| --env  |<rt1/csgo/recon_time/recon_rollout>|

๐Ÿ“œ Citation

If you find our code useful, please consider citing our paper:

@article{huang2025vid2world0,
  title={Vid2World: Crafting Video Diffusion Models to Interactive World Models}, 
    author={Siqiao Huang and Jialong Wu and Qixing Zhou and Shangchen Miao and Mingsheng Long},
    year={2025},
  journal= {arXiv preprint arXiv:2505.14357}
}

๐Ÿ“ฌ Contact

If you have any questions, please contact [email protected].

๐Ÿ’ก Acknowledgement

We sincerely appreciate the following github repos for their valuable codebase we build upon: