Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Inference with ODB-dLLM on GSM8K dataset

We propose ODB-dLLM, an arithmetic intensity inspired framework for accelerating diffusion-based large language models. By analyzing the interleaved compute- and memory-bound phases in existing dLLM inference frameworks, ODB-dLLM introduces adaptive length prediction strategy and jump-share speculative decoding to optimize computation-memory characteristics on hardware platforms, thereby maximizing inference efficiency.

Overall Performance:

ODB-dLLM achieves 46×–162× speedup on LLaDA-Instruct and 50–182× speedup on LLaDA-1.5. Compared with Fast-dLLM, ODB-dLLM also delivers 2.63–6.30× and 2.60–7.22× speedup on the two models, respectively.

Installation

1.Clone the repository:

git clone https://github.com/PKU-SEC-Lab/ODB-dLLM.git
cd ODB-dLLM

2.Install dependencies:

conda create --name ODB-dLLM python=3.10
conda activate ODB-dLLM
pip install -r requirements.txt

Usage

For LLaDA-8B-Instruct (LLaDA) model:

cd llada_instruct
./eval_llada_instruct.sh <GPU_ID> <Task_Name> 'GSAI-ML/LLaDA-8B-Instruct' <Settings>

For LLaDA-1.5 (LLaDA-1.5) model:

cd llada_1_5
./eval_llada_1_5.sh <GPU_ID> <Task_Name> 'GSAI-ML/LLaDA-1.5' <Settings>

Parameter descriptions:

<GPU_ID>: Choose which GPU to run on
<Task_Name>: Select the benchmark for evaluation
- Options: gsm8k, minerva_math, bbh, humaneval, mbpp
<Settings>: Select the configuration for evaluation
- Options:
  - llada_baseline: LLaDA baseline
  - fast_dllm_baseline: Fast-dLLM baseline: parallel + dualcache
  - alp: parallel + dualcache + adaptive length prediction
  - accept-jump: parallel + dualcache + adaptive length prediction + accept-jump speculative
  - jump-share: parallel + dualcache + adaptive length prediction + jump-share speculative

Example:

cd llada_instruct
./eval_llada_instruct.sh 0 gsm8k 'GSAI-ML/LLaDA-8B-Instruct' jump-share

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@article{wei2025orchestrating,
  title={Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models},
  author={Wei, Linye and Chen, Wenjue and Tang, Pingzhi and Guo, Xiaotian and Ye, Le and Wang, Runsheng and Li, Meng},
  journal={arXiv preprint arXiv:2511.21759},
  year={2025}
}

Acknowledgements

This repo is largely based on Fast-dLLM. We would also like to thank the authors of LLaDA and LLaDA-1.5 for their excellent work and open-source contributions.

Contact

If you have any questions, please contact us via email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
asset		asset
llada_1_5		llada_1_5
llada_instruct		llada_instruct
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Installation

Usage

Citation

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Installation

Usage

Citation

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages