TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration
We identify a fundamental mismatch between MoE architectures and dLLM. A large number of experts are activated at each denoising step, while only a small subset of tokens is ultimately accepted, resulting in substantial inference overhead and limiting their deployment in latency-sensitive applications.
We propose TEAM, a plug-and-play framework that accelerates MoE dLLMs by enabling more accepted tokens with fewer activated experts. TEAM employs three complementary expert activation and decoding strategies, conservatively selecting necessary experts for decoded and masked tokens and simultaneously performing aggressive speculative exploration across multiple candidates.
Overall Performance:
With SDAR 30B-A3B (SDAR) model, TEAM achieves an average speedup of 1.94× across diverse benchmarks, with a peak speedup of up to 2.2× on the HumanEval benchmark.
1.Clone the repository:
git clone https://github.com/PKU-SEC-Lab/TEAM-MoE-dLLM.git
cd TEAM-MoE-dLLM
2.Install conda:
conda create --name <your_env_name> python=3.10
conda activate <your_env_name>
2.Install dependencies:
Follow the Environment Setup in SDAR or install dependencies by:
conda env create -f evaluation/environment.yml
which mirrors SDAR's configuration.
Download the SDAR-30B-A3B model from Hugging face.
You can choose to directly perform the inference or simultaneously output the information on expert activation and decoding order during this period.
1.Directly inference:
Replace the modeling_sdar_moe.py in the downloaded model with the modeling_sdar_moe.py provided by us.
cd evaluation/opencompass
CUDA_VISIBLE_DEVICES=<GPU_ID> python run.py configs/eval_sdar_hf_<Task_Name>.py
Parameter descriptions:
<GPU_ID>: Choose which GPU to run on<Task_Name>: Select the benchmark for evaluation- Options:
gsm8k, math, humaneval, mbpp
- Options:
Example:
CUDA_VISIBLE_DEVICES=0 python run.py configs/eval_sdar_hf_gsm8k.py
Please make sure to replace the model path in the model_configs of eval_sdar_hf_<Task_Name>.py with the actual path of your downloaded model before inference.
2.Inference with relevant information output:
Replace the modeling_sdar_moe.py in the downloaded model with the modeling_sdar_moe_mark.py provided by us.
cd evaluation/opencompass
CUDA_VISIBLE_DEVICES=<GPU_ID> python run.py configs/eval_sdar_hf_<Task_Name>_mark.py
Example:
CUDA_VISIBLE_DEVICES=0 python run.py configs/eval_sdar_hf_gsm8k_mark.py
Please make sure to replace the model path in the model_configs of eval_sdar_hf_<Task_Name>_mark.py with the actual path of your downloaded model before inference.
Please make sure to replace the relevant paths in the evaluation\opencompass\opencompass\openicl\icl_inferencer\icl_gen_inferencer.py as needed.
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@article{wei2026team,
title={TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration},
author={Wei, Linye and Luo, Zixiang and Tang, Pingzhi and Li, Meng},
journal={arXiv preprint arXiv:2602.08404},
year={2026}
}
This repo is largely based on SDAR. We would like to thank the authors of this for their excellent work and open-source contributions.
If you have any questions, please contact us via email [email protected].

