This repository provides the official implementation of RIFLEx and UltraViCo, which achieve diffusion-transformer extrapolation for long video generation in a plug-and-play way.
This repository hosts RIFLEx and UltraViCo on separate branches, and the code is fully open source.
-
RIFLEx:
-
UltraViCo:
- ultra-wan: UltraViCo for Wan2.1
- ultra-hunyuan:UltraViCo for HunyuanVideo
This branch supports UltraViCo for HunyuanVideo. For Wan 2.1, please refer to the `ultra-wan` branch.
conda create -n ultravico_hy python=3.11 -y
conda activate ultravico_hy
pip install -r requirements.txtexport PYTHONPATH=$(pwd)/src
torchrun --nproc_per_node=8 --standalone -m parallel_examples.run_attention_patterns \
--alpha 0.9 \
--beta 0.6 \
--extrapolation_ratio 3 \
--height 544 \
--width 960 \
--num_inference_steps 50 \
--prompt "Brown bear wading slowly through shallow river, splashes frozen mid-air, forest reflection steady on water surface."-
extrapolation_ratio$\in (1,4]$ : the generated video length as a multiple of the training length -
alpha<beta$\in (0,1)$ : larger → stronger temporal consistency; smaller → better visual quality.
-
We adopt SageAttention for our UltraViCo attention kernel.
-
The code of the parallel diffusers-style HunyuanVideo is build upon ParaAttention. Many tanks to its developers!
-
Thank Tecent HuyuanVideo and Wan2.1 for their great open-source models!
If you find the code useful, please cite
@article{zhao2025ultravico,
title={UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers},
author={Zhao, Min and Zhu, Hongzhou and Wang, Yingze and Yan, Bokai and Zhang, Jintao and He, Guande and Yang, Ling and Li, Chongxuan and Zhu, Jun},
journal={arXiv preprint arXiv:2511.20123},
year={2025}
}
@article{zhao2025riflex,
title={Riflex: A free lunch for length extrapolation in video diffusion transformers},
author={Zhao, Min and He, Guande and Chen, Yixiao and Zhu, Hongzhou and Li, Chongxuan and Zhu, Jun},
journal={arXiv preprint arXiv:2502.15894},
year={2025}
}