VideoVLA[NeurIPS 2025]

VideoVLA is a simple approach that explores the potential of directly transforming large video generation models into robotic VLA manipulators..

This repository contains the official implementation of the paper:

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators NeurIPS 2025

🔗 Project Page: Project Website

📄 Paper: Paper Link

1. Quick Start

First, prepare the runtime environment and install all required dependencies by running:

bash build.sh

2. Downloading the Pretrained Checkpoint

Our method relies on pretrained components from CogVideo. You can follow the official CogVideo instructions to obtain the pretrained checkpoints: CogVideo

Specifically, download:

T5 checkpoint
VAE checkpoint

After downloading, update the checkpoint paths in the following configuration file:

config_use/action_config/videovla_config.yaml

Make sure the paths correctly point to the downloaded T5 and VAE checkpoints before starting training or evaluation.

3. Inference

This section describes how to run inference with a trained model checkpoint to generate video and action.

python sample_video_action.py \
  --base config_use/action_config/videovla_config.yaml config_use/action_config/inference_config/inference.yaml

Citations

@article{
    videovla,
    title={VideoVLA: Video Generators Can Be Generalizable Robot Manipulators},
    author={Yichao Shen and Fangyun Wei and Zhiying Du and Yaobo Liang and Yan Lu and Jiaolong Yang and Nanning Zheng and Baining Guo},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems(NeurIPS2025)},
    year={2025},
    url={https://openreview.net/forum?id=UPHlqbZFZB}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config_use/action_config		config_use/action_config
sgm		sgm
vae_modules		vae_modules
README.md		README.md
arguments.py		arguments.py
arguments_withact.py		arguments_withact.py
build.sh		build.sh
clean_marshal.sh		clean_marshal.sh
diffusion_video.py		diffusion_video.py
dit_video_concat.py		dit_video_concat.py
dit_video_concat_withact.py		dit_video_concat_withact.py
requirements.txt		requirements.txt
sample_video_action.py		sample_video_action.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoVLA[NeurIPS 2025]

1. Quick Start

2. Downloading the Pretrained Checkpoint

3. Inference

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VideoVLA[NeurIPS 2025]

1. Quick Start

2. Downloading the Pretrained Checkpoint

3. Inference

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages