TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration

We identify a fundamental mismatch between MoE architectures and dLLM. A large number of experts are activated at each denoising step, while only a small subset of tokens is ultimately accepted, resulting in substantial inference overhead and limiting their deployment in latency-sensitive applications.

We propose TEAM, a plug-and-play framework that accelerates MoE dLLMs by enabling more accepted tokens with fewer activated experts. TEAM employs three complementary expert activation and decoding strategies, conservatively selecting necessary experts for decoded and masked tokens and simultaneously performing aggressive speculative exploration across multiple candidates.

Overall Performance:

With SDAR 30B-A3B (SDAR) model, TEAM achieves an average speedup of 1.94× across diverse benchmarks, with a peak speedup of up to 2.2× on the HumanEval benchmark.

Installation

1.Clone the repository:

git clone https://github.com/PKU-SEC-Lab/TEAM-MoE-dLLM.git
cd TEAM-MoE-dLLM

2.Install conda:

conda create --name <your_env_name> python=3.10
conda activate <your_env_name>

2.Install dependencies:

Follow the Environment Setup in SDAR or install dependencies by:

conda env create -f evaluation/environment.yml

which mirrors SDAR's configuration.

Usage

Download the SDAR-30B-A3B model from Hugging face.

You can choose to directly perform the inference or simultaneously output the information on expert activation and decoding order during this period.

1.Directly inference:

Replace the modeling_sdar_moe.py in the downloaded model with the modeling_sdar_moe.py provided by us.

cd evaluation/opencompass
CUDA_VISIBLE_DEVICES=<GPU_ID> python run.py configs/eval_sdar_hf_<Task_Name>.py

Parameter descriptions:

<GPU_ID>: Choose which GPU to run on
<Task_Name>: Select the benchmark for evaluation
- Options: gsm8k, math, humaneval, mbpp

Example:

CUDA_VISIBLE_DEVICES=0 python run.py configs/eval_sdar_hf_gsm8k.py

Please make sure to replace the model path in the model_configs of eval_sdar_hf_<Task_Name>.py with the actual path of your downloaded model before inference.

2.Inference with relevant information output:

Replace the modeling_sdar_moe.py in the downloaded model with the modeling_sdar_moe_mark.py provided by us.

cd evaluation/opencompass
CUDA_VISIBLE_DEVICES=<GPU_ID> python run.py configs/eval_sdar_hf_<Task_Name>_mark.py

Example:

CUDA_VISIBLE_DEVICES=0 python run.py configs/eval_sdar_hf_gsm8k_mark.py

Please make sure to replace the model path in the model_configs of eval_sdar_hf_<Task_Name>_mark.py with the actual path of your downloaded model before inference. Please make sure to replace the relevant paths in the evaluation\opencompass\opencompass\openicl\icl_inferencer\icl_gen_inferencer.py as needed.

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@article{wei2026team,
  title={TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration},
  author={Wei, Linye and Luo, Zixiang and Tang, Pingzhi and Li, Meng},
  journal={arXiv preprint arXiv:2602.08404},
  year={2026}
}

Acknowledgements

This repo is largely based on SDAR. We would like to thank the authors of this for their excellent work and open-source contributions.

Contact

If you have any questions, please contact us via email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
evaluation		evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
modeling_sdar_moe.py		modeling_sdar_moe.py
modeling_sdar_moe_mark.py		modeling_sdar_moe_mark.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration

Installation

Usage

Citation

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration

Installation

Usage

Citation

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages