Dynamic Sliding Block (DSB) is a training-free block scheduling method.
DSB Cache is a training-free KV-cache scheme tailored to DSB for diffusion LLMs, further demonstrating the advantages of DSB.
- A better semi-autoregressive paradigm.
- DSB-tailored KV cache.
- A training-free, plug-and-play method, improving quality-speed trade-off.
- Fast inference support for Dream and LLaDA model.
- Full evaluation provided.
-
Dynamic Sliding Block (DSB) is a training-free decoding schedule for diffusion LLMs. Instead of using fixed blocks, it keeps an active block that slides forward and can change its size during inference. This lets the model decode easy/high-confidence tokens earlier (especially near block boundaries) and wait on low-confidence tokens until more context is available—improving the quality–speed trade-off.
-
DSB Cache is a training-free KV-cache design built for DSB. Sliding blocks can make newly exposed boundary tokens have unstable (transient) KV states, which hurts caching. To fix this, DSB Cache refreshes a small prefix window before the active block together with the block at every step, while caching the rest. It also does periodic global refreshes to keep the cache consistent—boosting throughput with minimal quality drop.
pip install -r requirements.txtpip install -r requirements-lock.txtWe provide the eval scripts for the main experiment, you can reproduce it directly. For example:
cd llada
bash eval_instruct.shThe main result is conducted on an Nvidia H200 140G GPU, we evaluate two variants of DSB: DSB(const.) and DSB (greedy), demonstrating the stable improvement of our method.
Thank you for citing this work if it helps your research!
@misc{dsb,
title={DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs},
author={Lizhuo Luo and Shenggui Li and Yonggang Wen and Tianwei Zhang},
year={2026},
eprint={2602.05992},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.05992},
}We would like to thank the authors of LLaDA, Dream and Fast-dLLM for their excellent work and open-source contributions.


