Anurag Bagchi, Zhipeng Bao, Homanga Bharadhwaj, Yu-Xiong Wang, Pavel Tokmakov, Martial Hebert
- [Soon] SCS metric scripts
- [Soon] Wan2.1-14B train and infer scripts
- [Very Soon] Cosmos-2B train and infer scripts
- [Very Soon] SVD train and 25-Dof manipulation inference scripts
- [Done] SVD navigation inference scripts 3-DoF and 25-DoF nav
- Download the RECON dataset from the RECON dataset page and extract it:
cd data
tar -xzf /mnt/fsx/Anurag/Nav_data/recon/recon_dataset.tar.gz -C recon/- Follow NWM to setup SCAND or Tartan Drive
3. Get the splits from NWM and save in data/splits/
- Download the meta zip from anuragba/egowm and extract it into
data/EVE1x/:
mkdir -p data/EVE1x
huggingface-cli download anuragba/egowm --repo-type model --local-dir data/EVE1x
unzip data/EVE1x/<meta_zip>.zip -d data/EVE1x/- Download the raw videos from 1x-technologies/world_model_raw_data and place them under
data/EVE1x/:
huggingface-cli download 1x-technologies/world_model_raw_data --repo-type dataset --local-dir data/EVE1xStable Video Diffusion
Install Requirements
Create and activate the conda environment for SVD:
conda env create -f SVD.yaml
conda activate SVDDownload Weights
Pretrained
Download the base model weights from Hugging Face to <pretrained_pth>:
stabilityai/stable-video-diffusion-img2vid-xt
Finetuned w Actions
- Download the finetuned checkpoints from anuragba/egowm into
checkpoints/:
mkdir -p checkpoints
huggingface-cli download anuragba/egowm --repo-type model --local-dir checkpoints3-DoF Navigation using $[\Delta x, \Delta y, \Delta \phi]$
In-Distribution Inference
Run Inferernce on the Test sets of the 3 in-domain datasets used to train the 3-DoF SVD model.
-
RECON
CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_recon_infer.py \ --pretrained_path <pretrained_pth> \ --split_root data/splits/ \ --data_root data/recon/ \ --num_frames 8 \ --name_prefix <run_name> \ --out_dir <output_root> \ --resume checkpoints/svd_3dof_nav.pth > output.log
-
SCAND
CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_scand_infer.py \ --pretrained_path <pretrained_pth> \ --split_root data/splits/ \ --data_root data/scand/ \ --num_frames 8 \ --name_prefix <run_name> \ --out_dir <output_root> \ --resume checkpoints/svd_3dof_nav.pth > output.log
-
Tartan Drive
CUDA_VISIBLE_DEVICES=<idx> python3 SVD_3dof_tartan_infer.py \ --pretrained_path <pretrained_pth> \ --split_root data/splits/ \ --data_root data/tartan/ \ --num_frames 8 \ --name_prefix <run_name> \ --out_dir <output_root> \ --resume checkpoints/svd_3dof_nav.pth > output.log
Painting Inference
Run Inference on OOD painting scenes using the 3-DoF navigation SVD model. Here we use the test set trajectories of scand or tartan as navigation actions to perform in the painting.
-
Painting Inference (SCAND/Tartan)
CUDA_VISIBLE_DEVICES=<idx> python3 SVD_paintings_infer.py \ --pretrained_path <pretrained_pth> \ --num_frames 8 \ --dataset <scand/tartan> \ --name_prefix <run_name> \ --out_dir <output_root> \ --resume checkpoints/svd_3dof_nav.pth \ > output.log
25-DoF Navigation using EVE 1x Humanoid
In-Distribution Inference
Run inference on the navigation samples in the 1x humanoid 25-DoF validation set.
CUDA_VISIBLE_DEVICES=<idx> python3 SVD_25dof_nav_1xval.py \
--num_frames 8 \
--pretrained_path <pretrained_pth> \
--name_prefix <run_name> \
--out_dir <output_root> \
--resume checkpoints/svd_25dof_nav.pth \
> output.logReal-World Inference
Run inference on real-world pictures of CMU campus clicked by us.
CUDA_VISIBLE_DEVICES=<idx> python3 SVD_25dof_nav_realw.py \
--num_frames 8 \
--pretrained_path <pretrained_pth> \
--name_prefix <run_name> \
--out_dir <output_root> \
--resume checkpoints/svd_25dof_nav.pth \
> output.log25-DoF Manipulation using EVE 1x Humanoid (Coming Soon)
Cosmos-2B
Coming soon.
Wan-14B
Coming soon.
