Also refer to here for the latest version with the Wan 2.2 model.
Here is the codebase for Vidar: Embodied Video Diffusion Model for Generalist Bimanual Manipulation.
Below you will find setup instructions and basic usage guidance for the code within the vidar folder.
Our code has been tested with CUDA 12.4.
If you encounter errors, please also refer to known issues in HunyuanVideo-I2V.
conda create -n vidar python==3.11.9conda activate vidarFor CUDA 12.4:
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia(Optional) Install the full CUDA toolkit:
conda install -c nvidia cuda-toolkit=12.4python -m pip install -r requirements.txtRequires CUDA 11.8 or newer:
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/[email protected]We recommend using PyTorch 2.4.0 and flash-attn 2.6.3:
python -m pip install xfuser==0.4.0If you encounter floating point exceptions (core dump) on certain GPUs, try:
pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/Ensure you have CUDA 12.4, CUBLAS >= 12.4.5.8, and CUDNN >= 9.00 installed.
Prepare your metadata as follows:
{
"video_path": "{VIDEO_PATH}",
"raw_caption": {
"long caption": "{PROMPT}"
}
}You also need to encode the videos in your dataset before training:
vm/hyvae_extract/start.sh
For more details, refer to Hunyuan VAE extract.
Edit scripts/vm/train.sh to match your platform settings, then run:
scripts/vm/train.shTo test your trained model:
scripts/vm/sample.shThis generates a video based on the first frame and your instruction.
- Training data (default folder):
assets/train - Testing data (default folder):
assets/test - Files are organized as
task_name/episode_idx.mp4(a multi-view video) andtask_name/episode_idx_qpos.pt(a 2D tensor with corresponding actions).
Edit scripts/idm/train.sh as needed, then run:
scripts/idm/train.shTo evaluate your model:
scripts/idm/eval.sh