GitHub - VankouF/MotionMillion-Codes

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan¹, Shunlin Lu², Minyue Dai³, Runyi Yu⁴,
Lixing Xiao⁵, Zhiyang Dou⁶, Junting Dong⁷, Lizhuang Ma^1,8†, Jingbo Wang⁷

¹Shanghai Jiao Tong University, ²CUHK, Shenzhen, ³Fudan University, ⁴HKUST,
⁵Zhejiang University, ⁶HKU, ⁷Shanghai AI Laboratory, ⁸East China Normal University.
† Corresponding author
ICCV 2025 Highlight

🤩 Abstract

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion—the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation.

📢 News

[2025/07/26] MotionMillion dataset is released.
[2025/07/24] Our paper received the Highlight award at ICCV 2025.
[2025/07/03] Train code, Inference code and Model checkpoints are released.
[2025/06/26] MotionMillion is officially accepted by ICCV 2025.

👨‍🏫 Quick Start

This section provides a quick start guide to set up the environment and run the demo. The following steps will guide you through the installation of the required dependencies, downloading the pretrained models, and preparing the datasets.

1. Conda environment

conda create python=3.8.11 --name motionmillion
conda activate motionmillion

Install the packages in requirements.txt.

pip install -r requirements.txt

We test our code on Python 3.8.11 and PyTorch 2.4.1.

2. Dependencies

🥳 Run the following command to install git-lfs

conda install conda-forge::git-lfs

🤖 Download SMPL+H and DMPL model

Download SMPL+H (Extended SMPL+H model used in AMASS project)
Download DMPL (DMPLs compatible with SMPL)
Place all models under ./body_models/

👤 Download human model files

Download files from Google Drive
Place under ./body_models/

⚙️ Run the script to download dependencies materials:

bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators_on_motionmillion.sh
bash prepare/download_T5-XL.sh

3. Pretrained models

We provide our 3B and 7B models trained on train.txt and all.txt respectively. Our 7B-all achieves the best zero-shot performance. Run the script to download the pre-trained models:

bash prepare/download_pretrained_models.sh

4. Prepare the datasets

Comming Soon! The dataset structure will be like:

dataset
├── MotionMillion
│   ├── motion_data
│   │   └── vector_272
│   │       ├── ...
│   │       └── ...
│   ├── texts
│   │   ├── ...
│   │   └── ...
│   │── mean_std
│   │    └── vector_272
│   │        ├── mean.npy
│   │        └── std.npy
│   │── split
│   │   └── version1
│   │       ├── t2m_60_300
│   │       │   ├── train.txt
│   │       │   ├── test.txt
│   │       │   ├── val.txt
│   │       │   └── all.txt
│   │       └── tokenizer_96
│   │       │   ├── train.txt
│   │       │   ├── test.txt
│   │       │   └── val.txt
├── ...

🎬 Inference

Please make sure that you have finished the preparations in Quick Start.

If you want to test the text-to-motion inference by yourself, please run the following commands:

bash scripts/inference/single_inference/test_t2m_3B.sh
bash scripts/inference/single_inference/test_t2m_7B.sh

please remind to replace the ${resume-pth} and the ${resume-trans} to the real path of your tokenizer and t2m model.

We follow the manner of video/image generation, using LLAMA3.1-8B as our rewrite model to rewrite the input prompt. If you don't want to use rewrite model, simply delete ${use_rewrite_model} and ${rewrite_model_path}.

If you want to test our MotionMillion-Eval benchmark, please run the following commands:

bash scripts/inference/batch_inference/test_t2m_3B.sh
bash scripts/inference/batch_inference/test_t2m_7B.sh

The MotionMillion-Eval prompts are save in assets/infer_batch_prompt.

🚀 Train your own models

We provide the training guidance for motion reconstruction and text-to-motion tasks. The following steps will guide you through the training process.

1. Train Tokenizer

For multi-gpus: run the following command: (We train our tokenizer by 4gpus on 80G gpu.)

bash scripts/train/train_tokenizer.sh

For single: run the following command:

bash scripts/train/train_tokenizer_single_gpu.sh

If you don't want to use wavelet transformation, simply delete ${use_patcher}, ${patch_size} and ${patch_method} arguments.

2. Train Text-to-Motion Model

First, please run the following command to inference all of the motion codes by the trained FSQ. change the ${resume-pth}$ arguments to the path of tokenzier checkpoints of yourself.

bash scripts/train/train_t2m_get_codes.sh

Then, Train 3B model on multi-gpus by ZeRO-1 parallel, run the following command:

bash scripts/train/train_t2m_3B.sh

Train 7B model on multi-gpus by ZeRO-2 parallel, run the following command:

bash scripts/train/train_t2m_7B.sh

3. Evaluate the models

4.1. Motion Reconstruction:

bash scripts/eval/eval_tokenizer.sh

4.2. Text-to-Motion:

bash scripts/eval/eval_t2m_3B.sh
bash scripts/eval/eval_t2m_7B.sh

🚨 Motion Postprocess

We provide a motion postprocess scripts to smooth and fix motion. Please execute the following command. A larger ${window_length} will result in smoother motion.

cd postprocess/remove_sliding
bash scripts/run_remove_sliding.sh

🌹 Acknowledgement

We would like to thank the authors of the following repositories for their excellent work: MotionLCM, T2M-GPT, MotionStreamer, Scamo, HumanML3D.

📜 Citation

If you find this work useful, please consider citing our paper:

@misc{fan2025zerozeroshotmotiongeneration,
      title={Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data}, 
      author={Ke Fan and Shunlin Lu and Minyue Dai and Runyi Yu and Lixing Xiao and Zhiyang Dou and Junting Dong and Lizhuang Ma and Jingbo Wang},
      year={2025},
      eprint={2507.07095},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.07095}, 
}

📚 License

This work is licensed under a Apache License.

If you have any question, please contact at Ke Fan and cc to Shunlin Lu and Jingbo Wang.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
configs/accelerate_configs		configs/accelerate_configs
dataset		dataset
mld		mld
models		models
options		options
postprocess/remove_sliding		postprocess/remove_sliding
prepare		prepare
scripts		scripts
utils		utils
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_t2m_llama.py		eval_t2m_llama.py
eval_tokenizer.py		eval_tokenizer.py
inference_batch.py		inference_batch.py
inference_single.py		inference_single.py
requirements.txt		requirements.txt
train_t2m_get_codes.py		train_t2m_get_codes.py
train_t2m_llama.py		train_t2m_llama.py
train_tokenizer.py		train_tokenizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

🤩 Abstract

📢 News

👨‍🏫 Quick Start

🎬 Inference

🚀 Train your own models

4.1. Motion Reconstruction:

4.2. Text-to-Motion:

🚨 Motion Postprocess

🌹 Acknowledgement

📜 Citation

📚 License

About

Uh oh!

Releases

Packages

Languages

License

VankouF/MotionMillion-Codes

Folders and files

Latest commit

History

Repository files navigation

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

🤩 Abstract

📢 News

👨‍🏫 Quick Start

🎬 Inference

🚀 Train your own models

4.1. Motion Reconstruction:

4.2. Text-to-Motion:

🚨 Motion Postprocess

🌹 Acknowledgement

📜 Citation

📚 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages