Skip to content

szqwu/Motion-Agent

Repository files navigation

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs (ICLR 2025)

arXiv page

Overview

While previous approaches to 3D human motion generation have achieved notable success, they often rely on extensive training and are limited to specific tasks. To address these challenges, we introduce Motion-Agent, an efficient conversational framework designed for general human motion generation, editing, and understanding. Motion-Agent employs an open-source pre-trained language model to develop a generative agent, MotionLLM, that bridges the gap between motion and text. This is accomplished by encoding and quantizing motions into discrete tokens that align with the language model's vocabulary. With only 1-3% of the model's parameters fine-tuned using adapters, MotionLLM delivers performance on par with diffusion models and other transformer-based methods trained from scratch. By integrating MotionLLM with GPT-4 without additional training, Motion-Agent is able to generate highly complex motion sequences through multi-turn conversations, a capability that previous models have struggled to achieve. Motion-Agent supports a wide range of motion-language tasks, offering versatile capabilities for generating and customizing human motion through interactive conversational exchanges.

Updates

  • [2025/05/15] The training script is released.
  • [2025/02/19] Demo and evaluation code are available.
  • [2025/02/06] Motion-Agent is accepted to ICLR 2025.
  • [2024/10/08] Motion-Agent paper is available.
  • [2024/05/28] Original version MotionLLM paper is available.

Citation

If you find our work useful, please cite us. The BibTeX is as follows.

@article{wu2024motion,
  title={Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs},
  author={Wu, Qi and Zhao, Yubo and Wang, Yifan and Liu, Xinhang and Tai, Yu-Wing and Tang, Chi-Keung},
  journal={arXiv preprint arXiv:2405.17013},
  year={2024}
}

Getting Started

Environment Setup

conda create -n motionagent python=3.10
conda activate motionagent
pip install -r requirements.txt

Download Motion-Agent ckpts

Download Motion-Agent ckpts.

bash prepare/download_ckpt.sh

Download Glove and extractor

Download evaluation models and gloves for evaluation.

bash prepare/download_glove.sh
bash prepare/download_extractor.sh

Prepare the LLM backbone

We use Google Gemma2-2B as MotionLLM's backbone. Please grant access from huggingface and use huggingface-cli login to login.

Demo

We provide an interactive demo for Motion-Agent that runs in your terminal. You will need to setup your own Azure OpenAI API key and endpoint. To start the demo:

python demo.py

Example Prompts

Here are some examples of what you can ask Motion-Agent:

  1. Motion Generation
Generate a motion of a person runs forward and then does a backflip.
  1. Motion Reasoning
Why is the person doing this? ./assets/motion_example.npy
Preview of the example motion

motion_example

Note: For motion reasoning, make sure your motion file is in the correct .npy format (HumanML3D format) and exists in the specified path.

Evaluation

To get the full data of HumanML3D, please follow the instruction in HumanML3D.

python eval_mllm.py

Training

To train your own tokenier, you can refer to T2M-GPT.

Motion generation and motion captioning are trained separately. You can train MotionLLM by running the following commands.

python train_mllm.py --training_task t2m
python train_mllm.py --training_task m2t

Acknowledgements

We would like to thank the following open-source projects for their contributions to our codes: T2M-GPT, NExT-GPT, text-to-motion.

About

Official repo of "Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published