Xin Zhou1*,
Dingkang Liang1*,
Kaijin Chen1, Tianrui Feng1,
Xiwu Chen2, Hongkai Lin1,
Yikang Ding2, Feiyang Tan2,
Hengshuang Zhao3,
Xiang Bai1†
1 Huazhong University of Science and Technology, 2 MEGVII Technology, 3 University of Hong Kong
(*) Equal contribution. (†) Corresponding author.
This document provides the implementation for accelerating the Wan2.1 model using EasyCache.
EasyCache significantly accelerates inference speed while maintaining high visual fidelity.
Prompt: "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about."
| Wan2.1-14B (Baseline, 720p, H20) | EasyCache (Ours, 720p, H20) |
|---|---|
![]() |
![]() |
| Inference Time: ~6862s | Inference Time: ~2884s (~2.4x Speedup) |
Prompt: "A cute green alien child with large ears, wearing a brown robe, sits on a chair and eats a blue cookie at a table, with crumbs scattered on the robe, in a cozy indoor setting."
| Wan2.1-14B I2V (Baseline, 720p, H20) | EasyCache (Ours, 720p, H20) |
|---|---|
![]() |
![]() |
| Inference Time: ~5302s | Inference Time: ~2397s (~2.2x Speedup) |
a. Prerequisites ⚙️
Before you begin, please follow the instructions in the official Wan2.1 repository to configure the required environment and download the pretrained model weights.
b. Copy Files 📂
Copy easycache_generate.py into the root directory of your local Wan2.1 project.
c. Run Inference
Execute the following command from the root of the Wan2.1 project to generate a video. To generate videos in 720p resolution, set the --size argument to 1280*720. You can also specify your own custom prompts.
python easycache_generate.py \
--task t2v-14B \
--size "1280*720" \
--ckpt_dir ./Wan2.1-T2V-14B \
--prompt "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about." \
--base_seed 0Execute the following command from the root of the Wan2.1 project to generate a video. To generate videos in 480p resolution, set the --size argument to 832*480 and set --ckpt_dir as ./Wan2.1-I2V-14B-480P. You can also specify your own custom prompts and images.
python easycache_generate.py \
--task i2v-14B \
--size "1280*720" \
--ckpt_dir ./Wan2.1-I2V-14B-720P \
--image examples/grogu.png \
--prompt "A cute green alien child with large ears, wearing a brown robe, sits on a chair and eats a blue cookie at a table, with crumbs scattered on the robe, in a cozy indoor setting." \
--base_seed 0We provide a simple script to quickly evaluate the similarity between two videos (e.g., the baseline result and your generated result) using common metrics.
Usage
# install required packages.
pip install lpips numpy tqdm torchmetrics
python tools/video_metrics.py --original_video video1.mp4 --generated_video video2.mp4--original_video: Path to the first video (e.g., the baseline).--generated_video: Path to the second video (e.g., the one generated with EasyCache).
We would like to thank the contributors to the Wan2.1 repository, for the open research and exploration.
If you find this repository useful in your research, please consider giving a star ⭐ and a citation.
@article{zhou2025easycache,
title={Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching},
author={Zhou, Xin and Liang, Dingkang and Chen, Kaijin and and Feng, Tianrui and Chen, Xiwu and Lin, Hongkai and Ding, Yikang and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},
journal={arXiv preprint arXiv:2507.02860},
year={2025}
}


