DualParal is a distributed inference strategy for Diffusion Transformers (DiT)-based video diffusion models. It achieves high efficiency by parallelizing both temporal frames and model layers with the help of block-wise denoising scheme. Feel free to visit our paper for more information.
🎥 Demo--more video samples in our project page!
A white-suited astronaut with a gold visor spins in dark space, tethered by a drifting cable. Stars twinkle around him as Earth glows blue in the distance. His suit reflects faint starlight against the vastness of the cosmos.
A flock of birds glides through the warm sunset sky, wings outstretched. Their feathers catch golden light as they soar above silhouetted treetops, with the sky glowing in soft hues of amber and pink.
- 8 Nov, 2025: 🎉 Our paper is accepted by AAAI 2026!
- 3 Oct, 2025: 👋 We've combined DualParal with the Wan2.2-T2V-A14B model.
- 27 May, 2025: 👋 We've released the DualParal code, which combines with the Wan2.1-T2V-1.3B and Wan2.1-T2V-14B.
conda create -n DualParal python=3.10
conda activate DualParal
# Ensure torch >= 2.4.0 according to your cuda version, the following use CUDA12.1 as example
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m examples.DualParal_Wan \
--model_id Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
--sample_steps 50 --num_per_block 8 --latents_num 40 --num_cat 8- Basic Args
| Parameter | Description |
|---|---|
dtype |
Model dtype (float64, float32, float16, fp32, fp16, half, bf16) |
seed |
The seed to use for generating the video. |
save_file |
The file to save the generated video to. |
verbose |
Enable verbose mode for debug. |
export_image |
Enable exporting video frames. |
- Model Args
| Parameter | Description |
|---|---|
model_id |
Model Id for Wan-2.1 (Wan-AI/Wan2.1-T2V-1.3B-Diffusers, or Wan-AI/Wan2.1-T2V-14B-Diffusers) and Wan2.2(Wan-AI/Wan2.1-T2V-14B-Diffusers). |
height |
Height of generating videos. |
width |
Width of generating videos. |
sample_steps |
The sampling steps. |
flow_shift |
Sampling shift factor for flow matching schedulers. |
sample_guide_scale |
Classifier free guidance scale. |
sample_guide_scale2 |
Classifier free guidance scale for the second model in Wan2.2. |
boundary_ratio |
Boundary ratio for Wan2.2. |
- Major Args for DualParal
| Parameter | Description |
|---|---|
prompt |
The prompt to generate the video from. |
num_per_block |
The number of latents per block in DualParal. |
latents_num |
The total number of latents sampled from video. latents_num must be divisible by num_per_block. The total number of video frames is calculated as (latents_num - 1) |
num_cat |
The number of latents to concatenate in previous and subsequent blocks separately. Increasing it (not greater than num_per_block) will lead better global consistency and temperoal coherence. Note that num_cat. |
- Original Wan implementation with single GPU
python -m examples.Wan-Video- DualParal
# For Wan2.1-14B
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m examples.DualParal_Wan \
--model_id Wan-AI/Wan2.1-T2V-14B-Diffusers \
--height 720 --width 1280 --sample_steps 50 \
--num_per_block 8 --latents_num 40 --num_cat 8
# For Wan2.2-A14B
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m examples.DualParal_Wan \
--model_id Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--height 720 --width 1280 --sample_steps 50 --sample_guide_scale 4.0 \
--sample_guide_scale2 3.0 --boundary_ratio 0.875 --flow_shift 12.0 \
--num_per_block 8 --latents_num 40 --num_cat 8Our project is based on the Wan model. We would like to thank the authors for their excellent work! ❤️
@article{Wang_Zheng_Yang_Tan_Xu_Wang_2026,
author={Wang, Zeqing and Zheng, Bowen and Yang, Xingyi and Tan, Zhenxiong and Xu, Yuecong and Wang, Xinchao},
title={Minute-Long Videos with Dual Parallelisms},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026},
month={Mar.},
pages={10358-10366}
}