Skip to content

PKU-SEC-Lab/ODB-dLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Inference with ODB-dLLM on GSM8K dataset

We propose ODB-dLLM, an arithmetic intensity inspired framework for accelerating diffusion-based large language models. By analyzing the interleaved compute- and memory-bound phases in existing dLLM inference frameworks, ODB-dLLM introduces adaptive length prediction strategy and jump-share speculative decoding to optimize computation-memory characteristics on hardware platforms, thereby maximizing inference efficiency.

Overall Performance:

ODB-dLLM achieves 46×–162× speedup on LLaDA-Instruct and 50–182× speedup on LLaDA-1.5. Compared with Fast-dLLM, ODB-dLLM also delivers 2.63–6.30× and 2.60–7.22× speedup on the two models, respectively.

LLaDA-Instruct
LLaDA-1.5

Installation

1.Clone the repository:

git clone https://github.com/PKU-SEC-Lab/ODB-dLLM.git
cd ODB-dLLM

2.Install dependencies:

conda create --name ODB-dLLM python=3.10
conda activate ODB-dLLM
pip install -r requirements.txt

Usage

For LLaDA-8B-Instruct (LLaDA) model:

cd llada_instruct
./eval_llada_instruct.sh <GPU_ID> <Task_Name> 'GSAI-ML/LLaDA-8B-Instruct' <Settings>

For LLaDA-1.5 (LLaDA-1.5) model:

cd llada_1_5
./eval_llada_1_5.sh <GPU_ID> <Task_Name> 'GSAI-ML/LLaDA-1.5' <Settings>

Parameter descriptions:

  • <GPU_ID>: Choose which GPU to run on
  • <Task_Name>: Select the benchmark for evaluation
    • Options: gsm8k, minerva_math, bbh, humaneval, mbpp
  • <Settings>: Select the configuration for evaluation
    • Options:
      • llada_baseline: LLaDA baseline
      • fast_dllm_baseline: Fast-dLLM baseline: parallel + dualcache
      • alp: parallel + dualcache + adaptive length prediction
      • accept-jump: parallel + dualcache + adaptive length prediction + accept-jump speculative
      • jump-share: parallel + dualcache + adaptive length prediction + jump-share speculative

Example:

cd llada_instruct
./eval_llada_instruct.sh 0 gsm8k 'GSAI-ML/LLaDA-8B-Instruct' jump-share

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@article{wei2025orchestrating,
  title={Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models},
  author={Wei, Linye and Chen, Wenjue and Tang, Pingzhi and Guo, Xiaotian and Ye, Le and Wang, Runsheng and Li, Meng},
  journal={arXiv preprint arXiv:2511.21759},
  year={2025}
}

Acknowledgements

This repo is largely based on Fast-dLLM. We would also like to thank the authors of LLaDA and LLaDA-1.5 for their excellent work and open-source contributions.

Contact

If you have any questions, please contact us via email [email protected].

About

Implementation of "Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors