Walk Through Paintings : Ego-centric World Models from Internet Priors

Anurag Bagchi, Zhipeng Bao, Homanga Bharadhwaj, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert

TODO

[Soon] SCS metric scripts
[Soon] Wan2.1-14B train and infer scripts
[Very Soon] Cosmos-2B train and infer scripts
[Very Soon] SVD train and 25-Dof manipulation inference scripts
[Done] SVD navigation inference scripts 3-DoF and 25-DoF nav

Data

Download and Process 3-DoF Navigation Datasets

1. RECON

Download the RECON dataset from the RECON dataset page and extract it:

cd data
tar -xzf /mnt/fsx/Anurag/Nav_data/recon/recon_dataset.tar.gz -C recon/

2. SCAND/Tartan Drive

Follow NWM to setup SCAND or Tartan Drive

3. Get the splits from NWM and save in `data/splits/`

Download and Process 25-DoF 1x Dataset

Download the meta zip from anuragba/egowm and extract it into data/EVE1x/:

mkdir -p data/EVE1x
huggingface-cli download anuragba/egowm --repo-type model --local-dir data/EVE1x
unzip data/EVE1x/<meta_zip>.zip -d data/EVE1x/

Download the raw videos from 1x-technologies/world_model_raw_data and place them under data/EVE1x/:

huggingface-cli download 1x-technologies/world_model_raw_data --repo-type dataset --local-dir data/EVE1x

Stable Video Diffusion

Install Requirements

Create and activate the conda environment for SVD:

conda env create -f SVD.yaml
conda activate SVD

Download Weights

Pretrained

Download the base model weights from Hugging Face to <pretrained_pth>: stabilityai/stable-video-diffusion-img2vid-xt

Finetuned w Actions

Download the finetuned checkpoints from anuragba/egowm into checkpoints/:

mkdir -p checkpoints
huggingface-cli download anuragba/egowm --repo-type model --local-dir checkpoints

3-DoF Navigation using $[\Delta x, \Delta y, \Delta \phi]$

In-Distribution Inference

Run Inferernce on the Test sets of the 3 in-domain datasets used to train the 3-DoF SVD model.

RECON

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_recon_infer.py \
  --pretrained_path <pretrained_pth> \
  --split_root data/splits/ \
  --data_root data/recon/ \
  --num_frames 8 \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_3dof_nav.pth > output.log

SCAND

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_scand_infer.py \
  --pretrained_path <pretrained_pth> \
  --split_root data/splits/ \
  --data_root data/scand/ \
  --num_frames 8 \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_3dof_nav.pth > output.log

Tartan Drive

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_tartan_infer.py \
  --pretrained_path <pretrained_pth> \
  --split_root data/splits/ \
  --data_root data/tartan/ \
  --num_frames 8 \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_3dof_nav.pth > output.log

Painting Inference

Run Inference on OOD painting scenes using the 3-DoF navigation SVD model. Here we use the test set trajectories of scand or tartan as navigation actions to perform in the painting.

Painting Inference (SCAND/Tartan)

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_paintings_infer.py \
  --pretrained_path <pretrained_pth> \
  --num_frames 8 \
  --dataset <scand/tartan> \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_3dof_nav.pth \
  > output.log

25-DoF Navigation using EVE 1x Humanoid

In-Distribution Inference

Run inference on the navigation samples in the 1x humanoid 25-DoF validation set.

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_25dof_nav_1xval.py \
  --num_frames 8 \
  --pretrained_path <pretrained_pth> \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_25dof_nav.pth \
  > output.log

Real-World Inference

Run inference on real-world pictures of CMU campus clicked by us.

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_25dof_nav_realw.py \
  --num_frames 8 \
  --pretrained_path <pretrained_pth> \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_25dof_nav.pth \
  > output.log

25-DoF Manipulation using EVE 1x Humanoid (Coming Soon)

Cosmos-2B

Coming soon.

Wan-14B

Coming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
models		models
README.md		README.md
SVD.yaml		SVD.yaml
SVD_25dof_nav_1xval.py		SVD_25dof_nav_1xval.py
SVD_25dof_nav_realw.py		SVD_25dof_nav_realw.py
SVD_3dof_paintings_infer.py		SVD_3dof_paintings_infer.py
SVD_3dof_recon_infer.py		SVD_3dof_recon_infer.py
SVD_3dof_scand_infer.py		SVD_3dof_scand_infer.py
SVD_3dof_tartan_infer.py		SVD_3dof_tartan_infer.py
opts.py		opts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Walk Through Paintings : Ego-centric World Models from Internet Priors

TODO

Data

Download and Process 3-DoF Navigation Datasets

1. RECON

2. SCAND/Tartan Drive

3. Get the splits from NWM and save in `data/splits/`

Download and Process 25-DoF 1x Dataset

Install Requirements

Download Weights

Pretrained

Finetuned w Actions

3-DoF Navigation using $[\Delta x, \Delta y, \Delta \phi]$

In-Distribution Inference

Painting Inference

25-DoF Navigation using EVE 1x Humanoid

In-Distribution Inference

Real-World Inference

25-DoF Manipulation using EVE 1x Humanoid (Coming Soon)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Walk Through Paintings : Ego-centric World Models from Internet Priors

TODO

Data

Download and Process 3-DoF Navigation Datasets

1. RECON

2. SCAND/Tartan Drive

3. Get the splits from NWM and save in data/splits/

Download and Process 25-DoF 1x Dataset

Install Requirements

Download Weights

Pretrained

Finetuned w Actions

3-DoF Navigation using $[\Delta x, \Delta y, \Delta \phi]$

In-Distribution Inference

Painting Inference

25-DoF Navigation using EVE 1x Humanoid

In-Distribution Inference

Real-World Inference

25-DoF Manipulation using EVE 1x Humanoid (Coming Soon)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Get the splits from NWM and save in `data/splits/`

Packages