Skip to content

miccooper9/egowm

Repository files navigation

Walk Through Paintings : Ego-centric World Models from Internet Priors

Anurag Bagchi, Zhipeng Bao, Homanga Bharadhwaj, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert

arXiv Project Page Hugging Face

Code Banner

TODO

  • [Soon] SCS metric scripts
  • [Soon] Wan2.1-14B train and infer scripts
  • [Very Soon] Cosmos-2B train and infer scripts
  • [Very Soon] SVD train and 25-Dof manipulation inference scripts
  • [Done] SVD navigation inference scripts 3-DoF and 25-DoF nav

Data

Download and Process 3-DoF Navigation Datasets

1. RECON

cd data
tar -xzf /mnt/fsx/Anurag/Nav_data/recon/recon_dataset.tar.gz -C recon/

2. SCAND/Tartan Drive

  • Follow NWM to setup SCAND or Tartan Drive

3. Get the splits from NWM and save in data/splits/

Download and Process 25-DoF 1x Dataset

  • Download the meta zip from anuragba/egowm and extract it into data/EVE1x/:
mkdir -p data/EVE1x
huggingface-cli download anuragba/egowm --repo-type model --local-dir data/EVE1x
unzip data/EVE1x/<meta_zip>.zip -d data/EVE1x/
huggingface-cli download 1x-technologies/world_model_raw_data --repo-type dataset --local-dir data/EVE1x
Stable Video Diffusion

Install Requirements

Create and activate the conda environment for SVD:

conda env create -f SVD.yaml
conda activate SVD

Download Weights

Pretrained

Download the base model weights from Hugging Face to <pretrained_pth>: stabilityai/stable-video-diffusion-img2vid-xt

Finetuned w Actions

  • Download the finetuned checkpoints from anuragba/egowm into checkpoints/:
mkdir -p checkpoints
huggingface-cli download anuragba/egowm --repo-type model --local-dir checkpoints

3-DoF Navigation using $[\Delta x, \Delta y, \Delta \phi]$

In-Distribution Inference

Run Inferernce on the Test sets of the 3 in-domain datasets used to train the 3-DoF SVD model.

  • RECON

    CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_recon_infer.py \
      --pretrained_path <pretrained_pth> \
      --split_root data/splits/ \
      --data_root data/recon/ \
      --num_frames 8 \
      --name_prefix <run_name> \
      --out_dir <output_root> \
      --resume checkpoints/svd_3dof_nav.pth > output.log
  • SCAND

    CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_scand_infer.py \
      --pretrained_path <pretrained_pth> \
      --split_root data/splits/ \
      --data_root data/scand/ \
      --num_frames 8 \
      --name_prefix <run_name> \
      --out_dir <output_root> \
      --resume checkpoints/svd_3dof_nav.pth > output.log
  • Tartan Drive

    CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_tartan_infer.py \
      --pretrained_path <pretrained_pth> \
      --split_root data/splits/ \
      --data_root data/tartan/ \
      --num_frames 8 \
      --name_prefix <run_name> \
      --out_dir <output_root> \
      --resume checkpoints/svd_3dof_nav.pth > output.log

Painting Inference

Run Inference on OOD painting scenes using the 3-DoF navigation SVD model. Here we use the test set trajectories of scand or tartan as navigation actions to perform in the painting.

  • Painting Inference (SCAND/Tartan)

    CUDA_VISIBLE_DEVICES=<idx> python3 SVD_paintings_infer.py \
      --pretrained_path <pretrained_pth> \
      --num_frames 8 \
      --dataset <scand/tartan> \
      --name_prefix <run_name> \
      --out_dir <output_root> \
      --resume checkpoints/svd_3dof_nav.pth \
      > output.log

25-DoF Navigation using EVE 1x Humanoid

In-Distribution Inference

Run inference on the navigation samples in the 1x humanoid 25-DoF validation set.

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_25dof_nav_1xval.py \
  --num_frames 8 \
  --pretrained_path <pretrained_pth> \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_25dof_nav.pth \
  > output.log

Real-World Inference

Run inference on real-world pictures of CMU campus clicked by us.

CUDA_VISIBLE_DEVICES=<idx> python3 SVD_25dof_nav_realw.py \
  --num_frames 8 \
  --pretrained_path <pretrained_pth> \
  --name_prefix <run_name> \
  --out_dir <output_root> \
  --resume checkpoints/svd_25dof_nav.pth \
  > output.log

25-DoF Manipulation using EVE 1x Humanoid (Coming Soon)

Cosmos-2B

Coming soon.

Wan-14B

Coming soon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages