EchoShot: Multi-Shot Portrait Video Generation

☆ NeurIPS 2025 ☆

Jiahao Wang¹ · Hualian Sheng² · Sijia Cai^2,† · Weizhan Zhang^1,*
Caixia Yan¹ · Yachuang Feng² . Bing Deng² . Jieping Ye²

¹Xi'an Jiaotong University ²Alibaba Cloud

Please give us a star⭐ on GitHub if you like our work.

📝 Intro

This is the official Github page of EchoShot, the code has been released at D2I-ai. Though, you can still refer to the instructions in this repo to fully experience EchoShot. It allows users to generate multiple video shots showing the same person, controlled by customized prompts. Currently it supports text-to-multishot portrait video generation. Hope you have fun with this demo!

🔔 News

September 18, 2025: 🎉🎉🎉 EchoShot has been accepted to NeurIPS 2025!
July 15, 2025: 🔥 EchoShot-1.3B-preview is now available at HuggingFace!
July 15, 2025: 🎉 Release official inference and training codes at D2I-ai.
May 25, 2025: We propose EchoShot, a multi-shot portrait video generation model.

⚙️ Installation

Prepare Code

First, use this code to download codes from D2I-ai:

git clone https://github.com/D2I-ai/EchoShot
cd EchoShot

Construct Environment

Use this code to install the required packages:

conda create -n echoshot python=3.10
conda activate echoshot
pip install -r requirements.txt

Download Model

Since EchoShot is based on Wan2.1, you have to first download Wan2.1-T2V-1.3B using:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir .models/Wan2.1-T2V-1.3B

Then download the EchoShot model:

huggingface-cli download JonneyWang/EchoShot --local-dir ./models/EchoShot

Organize Files

We recommend to organize local directories as:

EchoShot
├── ...
├── dataset
│   |── video
|   |   ├── 1.mp4
|   |   ├── 2.mp4
|   |   └── ...
|   └── train.json
├── models
│   |── Wan2.1-T2V-1.3B
│   |   └── ...
│   └── EchoShot
|       ├── EchoShot-1.3B-preview.pth
|       └── ...
└── ...

🎬 Usage

Inference

With LLM prompt extension

For optimal performance, we highly recommend using LLM for prompt extension. We provide a Dashscope API usage for extension:

Use the Dashscope API for extension.
- Apply for a dashscope.api_key in advance (EN | CN).
- Configure the environment variable DASH_API_KEY to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable DASH_API_URL to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the dashscope document.
- Use the qwen-plus model for extension.

You can specify the DASH_API_KEY and other important configs in generate.sh. Then run this code to start sampling:

bash generate.sh

Without LLM prompt extension

If you don't want to use prompt extension, remove the --use_prompt_extend in generate.sh and run:

bash generate.sh

Train

If you want to train your own version of the model, please prepare the dataset, which should include video files and their corresponding JSON files. Here, we provide an example in dataset/train.json for reference. All training configurations are stored in config_train.py, where you can make specific modifications according to your needs. Once everything is set up, execute the following code to start the training process:

bash train.sh

✨ Acknowledgements

We would like to express our sincere thank to Wan Team for their support.

📖 Citation

If you are inspired by our work, please cite our paper.

@article{wang2025echoshot,
  title={EchoShot: Multi-Shot Portrait Video Generation},
  author={Wang, Jiahao and Sheng, Hualian and Cai, Sijia and Zhang, Weizhan and Yan, Caixia and Feng, Yachuang and Deng, Bing and Ye, Jieping},
  journal={arXiv preprint arXiv:2506.15838},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EchoShot: Multi-Shot Portrait Video Generation

📝 Intro

🔔 News

⚙️ Installation

Prepare Code

Construct Environment

Download Model

Organize Files

🎬 Usage

Inference

With LLM prompt extension

Without LLM prompt extension

Train

✨ Acknowledgements

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

EchoShot: Multi-Shot Portrait Video Generation

📝 Intro

🔔 News

⚙️ Installation

Prepare Code

Construct Environment

Download Model

Organize Files

🎬 Usage

Inference

With LLM prompt extension

Without LLM prompt extension

Train

✨ Acknowledgements

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages