Skip to content

yaofeng1998/Vidar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation

Also refer to here for the latest version with the Wan 2.2 model.

Introduction

Here is the codebase for Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation.

Below you will find setup instructions and basic usage guidance for the code within the vidar folder.


Environment Setup

Our code has been tested with CUDA 12.4.
If you encounter errors, please also refer to known issues in HunyuanVideo-I2V.

1. Create a Conda Environment

conda create -n vidar python==3.11.9

2. Activate the Environment

conda activate vidar

3. Install PyTorch and CUDA Dependencies

For CUDA 12.4:

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

(Optional) Install the full CUDA toolkit:

conda install -c nvidia cuda-toolkit=12.4

4. Install Python Requirements

python -m pip install -r requirements.txt

5. Install Flash Attention v2 for Acceleration

Requires CUDA 11.8 or newer:

python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/[email protected]

6. Install xDiT for Parallel Inference

We recommend using PyTorch 2.4.0 and flash-attn 2.6.3:

python -m pip install xfuser==0.4.0

Troubleshooting: Floating Point Exceptions

If you encounter floating point exceptions (core dump) on certain GPUs, try:

pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/

Ensure you have CUDA 12.4, CUBLAS >= 12.4.5.8, and CUDNN >= 9.00 installed.


Video Diffusion Model

Data Preparation

Prepare your metadata as follows:

{
    "video_path": "{VIDEO_PATH}",
    "raw_caption": {
        "long caption": "{PROMPT}"
    }
}

You also need to encode the videos in your dataset before training:

vm/hyvae_extract/start.sh

For more details, refer to Hunyuan VAE extract.

Training

Edit scripts/vm/train.sh to match your platform settings, then run:

scripts/vm/train.sh

Inference

To test your trained model:

scripts/vm/sample.sh

This generates a video based on the first frame and your instruction.


Masked Inverse Dynamic Model

Data Preparation

  • Training data (default folder): assets/train
  • Testing data (default folder): assets/test
  • Files are organized as task_name/episode_idx.mp4 (a multi-view video) and task_name/episode_idx_qpos.pt (a 2D tensor with corresponding actions).

Training

Edit scripts/idm/train.sh as needed, then run:

scripts/idm/train.sh

Inference

To evaluate your model:

scripts/idm/eval.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors