MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

Hongpeng Wang*, Zeyu Zhang*^†, Wenhao Li, Hao Tang^‡

*Equal contribution. ^†Project lead. ^‡Corresponding author.

Paper | Website | Data | Model

Qualitative Videos

Intro

MoRL is a unified multimodal motion model designed to advance both human motion understanding and generation. Unlike prior approaches that treat user queries as a whole and lack explicit reasoning or planning, MoRL leverages a hierarchical post-training pipeline combining supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). Our task-specific reward design is dual-headed: for motion understanding, we introduce semantic alignment and a novel reasoning coherence reward to enforce logically consistent reasoning traces; for motion generation, we combine text–motion consistency with a physical plausibility reward to ensure biomechanical validity and perceptual realism. To further enhance inference, we propose Chain-of-Motion (CoM), a test-time reasoning strategy that enables step-by-step planning and reflection. CoM improves both the robustness of reasoning-based motion understanding and the quality of motion generation through iterative selection and correction. This principle also guides the construction of two large-scale synthetic Chain-of-Thinking (CoT) datasets: MoUnd-CoT-140K and MoGen-CoT-140K, which align motion sequences with reasoning traces and concise action descriptions. Extensive experiments on HumanML3D and KIT-ML demonstrate that MoRL achieves significant gains over state-of-the-art baselines in both logical reasoning and perceptual realism. Our code, data, and models are open-sourced to facilitate further research in unified motion-language modeling.

Overview of MoRL. MoRL unifies motion understanding and generation under a reinforcement learning paradigm. Motion and text inputs are tokenized into a shared representation space. A hierarchical post-training pipeline first applies SFT on large-scale synthetic CoT datasets to align motion sequences with reasoning traces and concise descriptions, then employs RLVR to refine outputs, enhancing semantic alignment, reasoning coherence, physical plausibility, and text–motion consistency. At inference, the Chain-of-Motion (CoM) decoding strategy enables step-by-step reasoning and reflection, improving both motion understanding and perceptually realistic motion generation.

Motion CoT data engine. Built on MotionHubV2, one branch (MoUnd-CoT-140K) uses motion sequences and captions with Gemini to construct reasoning chains for understanding, while the other (MoGen-CoT-140K) builds reasoning chains for generation.

TODO List

Upload our paper to arXiv and build project pages.
Upload the code.
Release curated MoUnd-CoT / MoGen-CoT data. (see MoUnd-MoGen-CoT-140K)
Release training checkpoints.

Quick Start

Environment Setup

Install dependencies:

pip install -r requirements.txt

Prepare Basic Resources

This repo provides helper scripts under prepare/:

prepare/download_extractor.sh
prepare/download_glove.sh

Run with bash:

bash prepare/download_extractor.sh
bash prepare/download_ckpt.sh

If you are on native Windows PowerShell, you can manually download and unzip these assets following the URLs in the scripts.

Data Preparation

Download and Prepare Motion Datasets

You can download the pre-processed motion datasets from our HuggingFace page. For custom data or full AMASS/kitml/HumanML3D, please follow the instructions in dataset/ and prepare/ folders.

Training

A) SFT Stage

SFT for t2m

python train_mllm.py \
  --train-stage sft \
  --training-task t2m \
  --cot-train-jsonl path/to/cot_train.jsonl \
  --use-reasoning \
  --exp-name morl_sft_t2m

SFT for m2t

python train_mllm.py \
  --train-stage sft \
  --training-task m2t \
  --cot-train-jsonl path/to/cot_train.jsonl \
  --cot-task-filter m2t \
  --use-reasoning \
  --exp-name morl_sft_m2t

B) RLVR Stage (GRPO)

python train_mllm.py \
  --train-stage rlvr \
  --training-task t2m \
  --cot-train-jsonl path/to/cot_train.jsonl \
  --rl-reference-ckpt experiments/morl_sft_t2m/motionllm_t2m_best.pth \
  --rl-epochs 3 \
  --rl-group-size 8 \
  --exp-name morl_rlvr_t2m

For m2t RLVR, set --training-task m2t and optionally --cot-task-filter m2t.

Evaluation / Inference

Evaluate t2m

python eval_mllm.py \
  --eval-task t2m \
  --eval-ckpt experiments/morl_rlvr_t2m/motionllm_rlvr_epoch_2.pth

Evaluate m2t

python eval_mllm.py \
  --eval-task m2t \
  --eval-ckpt experiments/morl_rlvr_m2t/motionllm_rlvr_epoch_2.pth

Enable CoM decoding

python eval_mllm.py \
  --eval-task t2m \
  --eval-ckpt experiments/morl_rlvr_t2m/motionllm_rlvr_epoch_2.pth \
  --use-com \
  --com-candidates 8 \
  --com-refine-steps 2

Citation

If you find this project useful, please consider citing:

@article{wang2026morl,
  title={MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation},
  author={Wang, Hongpeng and Zhang, Zeyu and Li, Wenhao and Tang, Hao},
  journal={arXiv preprint arXiv:2602.14534},
  year={2026}
}

Acknowledgement

We thank the open-source communities behind Motion-Agent, MotionGPT, Qwen, and related motion-language benchmarks for their foundational contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
dataset		dataset
datautils		datautils
models		models
options		options
prepare		prepare
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
eval_mllm.py		eval_mllm.py
requirements.txt		requirements.txt
train_mllm.py		train_mllm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

Paper | Website | Data | Model

Qualitative Videos

Intro

TODO List

Quick Start

Environment Setup

Prepare Basic Resources

Data Preparation

Download and Prepare Motion Datasets

Training

A) SFT Stage

SFT for t2m

SFT for m2t

B) RLVR Stage (GRPO)

Evaluation / Inference

Evaluate t2m

Evaluate m2t

Enable CoM decoding

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

Paper | Website | Data | Model

Qualitative Videos

Intro

TODO List

Quick Start

Environment Setup

Prepare Basic Resources

Data Preparation

Download and Prepare Motion Datasets

Training

A) SFT Stage

SFT for t2m

SFT for m2t

B) RLVR Stage (GRPO)

Evaluation / Inference

Evaluate t2m

Evaluate m2t

Enable CoM decoding

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages