Skip to content

VankouF/MotionMillion-Codes

Repository files navigation

Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data

Ke Fan1, Shunlin Lu2, Minyue Dai3, Runyi Yu4,
Lixing Xiao5, Zhiyang Dou6, Junting Dong7, Lizhuang Ma1,8†, Jingbo Wang7

1Shanghai Jiao Tong University, 2CUHK, Shenzhen, 3Fudan University, 4HKUST,
5Zhejiang University, 6HKU, 7Shanghai AI Laboratory, 8East China Normal University.
† Corresponding author
ICCV 2025 Highlight

🤩 Abstract

Generating diverse and natural human motion sequences based on textual descriptions constitutes a fundamental and challenging research area within the domains of computer vision, graphics, and robotics. Despite significant advancements in this field, current methodologies often face challenges regarding zero-shot generalization capabilities, largely attributable to the limited size of training datasets. Moreover, the lack of a comprehensive evaluation framework impedes the advancement of this task by failing to identify directions for improvement. In this work, we aim to push text-to-motion into a new era, that is, to achieve the generalization ability of zero-shot. To this end, firstly, we develop an efficient annotation pipeline and introduce MotionMillion—the largest human motion dataset to date, featuring over 2,000 hours and 2 million high-quality motion sequences. Additionally, we propose MotionMillion-Eval, the most comprehensive benchmark for evaluating zero-shot motion generation. Leveraging a scalable architecture, we scale our model to 7B parameters and validate its performance on MotionMillion-Eval. Our results demonstrate strong generalization to out-of-domain and complex compositional motions, marking a significant step toward zero-shot human motion generation.

📢 News

  • [2025/07/26] MotionMillion dataset is released.
  • [2025/07/24] Our paper received the Highlight award at ICCV 2025.
  • [2025/07/03] Train code, Inference code and Model checkpoints are released.
  • [2025/06/26] MotionMillion is officially accepted by ICCV 2025.

👨‍🏫 Quick Start

This section provides a quick start guide to set up the environment and run the demo. The following steps will guide you through the installation of the required dependencies, downloading the pretrained models, and preparing the datasets.

1. Conda environment
conda create python=3.8.11 --name motionmillion
conda activate motionmillion

Install the packages in requirements.txt.

pip install -r requirements.txt

We test our code on Python 3.8.11 and PyTorch 2.4.1.

2. Dependencies 🥳 Run the following command to install git-lfs
conda install conda-forge::git-lfs
🤖 Download SMPL+H and DMPL model
  1. Download SMPL+H (Extended SMPL+H model used in AMASS project)
  2. Download DMPL (DMPLs compatible with SMPL)
  3. Place all models under ./body_models/
👤 Download human model files
  1. Download files from Google Drive
  2. Place under ./body_models/
⚙️ Run the script to download dependencies materials:
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators_on_motionmillion.sh
bash prepare/download_T5-XL.sh
3. Pretrained models

We provide our 3B and 7B models trained on train.txt and all.txt respectively. Our 7B-all achieves the best zero-shot performance. Run the script to download the pre-trained models:

bash prepare/download_pretrained_models.sh
4. Prepare the datasets Comming Soon! The dataset structure will be like:
dataset
├── MotionMillion
│   ├── motion_data
│   │   └── vector_272
│   │       ├── ...
│   │       └── ...
│   ├── texts
│   │   ├── ...
│   │   └── ...
│   │── mean_std
│   │    └── vector_272
│   │        ├── mean.npy
│   │        └── std.npy
│   │── split
│   │   └── version1
│   │       ├── t2m_60_300
│   │       │   ├── train.txt
│   │       │   ├── test.txt
│   │       │   ├── val.txt
│   │       │   └── all.txt
│   │       └── tokenizer_96
│   │       │   ├── train.txt
│   │       │   ├── test.txt
│   │       │   └── val.txt
├── ...

🎬 Inference

Please make sure that you have finished the preparations in Quick Start.

If you want to test the text-to-motion inference by yourself, please run the following commands:

bash scripts/inference/single_inference/test_t2m_3B.sh
bash scripts/inference/single_inference/test_t2m_7B.sh

please remind to replace the ${resume-pth} and the ${resume-trans} to the real path of your tokenizer and t2m model.

We follow the manner of video/image generation, using LLAMA3.1-8B as our rewrite model to rewrite the input prompt. If you don't want to use rewrite model, simply delete ${use_rewrite_model} and ${rewrite_model_path}.

If you want to test our MotionMillion-Eval benchmark, please run the following commands:

bash scripts/inference/batch_inference/test_t2m_3B.sh
bash scripts/inference/batch_inference/test_t2m_7B.sh

The MotionMillion-Eval prompts are save in assets/infer_batch_prompt.

🚀 Train your own models

We provide the training guidance for motion reconstruction and text-to-motion tasks. The following steps will guide you through the training process.

1. Train Tokenizer

For multi-gpus: run the following command: (We train our tokenizer by 4gpus on 80G gpu.)

bash scripts/train/train_tokenizer.sh

For single: run the following command:

bash scripts/train/train_tokenizer_single_gpu.sh

If you don't want to use wavelet transformation, simply delete ${use_patcher}, ${patch_size} and ${patch_method} arguments.

2. Train Text-to-Motion Model

First, please run the following command to inference all of the motion codes by the trained FSQ. change the ${resume-pth}$ arguments to the path of tokenzier checkpoints of yourself.

bash scripts/train/train_t2m_get_codes.sh

Then, Train 3B model on multi-gpus by ZeRO-1 parallel, run the following command:

bash scripts/train/train_t2m_3B.sh

Train 7B model on multi-gpus by ZeRO-2 parallel, run the following command:

bash scripts/train/train_t2m_7B.sh
3. Evaluate the models

4.1. Motion Reconstruction:

bash scripts/eval/eval_tokenizer.sh

4.2. Text-to-Motion:

bash scripts/eval/eval_t2m_3B.sh
bash scripts/eval/eval_t2m_7B.sh

🚨 Motion Postprocess

We provide a motion postprocess scripts to smooth and fix motion. Please execute the following command. A larger ${window_length} will result in smoother motion.

cd postprocess/remove_sliding
bash scripts/run_remove_sliding.sh

🌹 Acknowledgement

We would like to thank the authors of the following repositories for their excellent work: MotionLCM, T2M-GPT, MotionStreamer, Scamo, HumanML3D.

📜 Citation

If you find this work useful, please consider citing our paper:

@misc{fan2025zerozeroshotmotiongeneration,
      title={Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data}, 
      author={Ke Fan and Shunlin Lu and Minyue Dai and Runyi Yu and Lixing Xiao and Zhiyang Dou and Junting Dong and Lizhuang Ma and Jingbo Wang},
      year={2025},
      eprint={2507.07095},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.07095}, 
}

📚 License

This work is licensed under a Apache License.

If you have any question, please contact at Ke Fan and cc to Shunlin Lu and Jingbo Wang.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published