DAWN is a training-free, dependency-aware decoding method for fast dLLM inference.
DAWN leverages a dependency graph to select more reliable unmasking positions at each iteration, achieving high parallelism with negligible loss in generation quality.
- Mitigate nonindependent prediction via inter-poistion dependency.
- A training-free, plug-and-play method, improving quality-speed trade-off.
- Fast inference support for Dream and LLaDA model.
- Multiple baseline method realization.
- Full evaluation provided.
DAWN is composed of three main modules: Dependency Graph Construction, Anchor-Guided Decoding and Conflict-Based Scheduling.
-
Dependency Graph Construction extracts a lightweight proxy of token dependencies from the model’s attention maps and builds a sparse directed dependency graph. It mitigates attention-sink bias by filtering positions with abnormal incoming attention mass, then retains only salient high-score attention links to capture meaningful couplings between positions for downstream scheduling.
-
Anchor-Guided Decoding first selects high-confidence masked positions that are likely safe to unmask in parallel, then uses previously committed high-confidence positions as anchors to relax the confidence requirement for their dependent (induced) positions. This expands safe parallelism beyond conservative thresholding by leveraging reliable context provided by anchors.
-
Conflict-Based Scheduling prevents error-prone joint updates by explicitly avoiding strongly coupled positions for remaining candidates under a lower confidence threshold. Using the dependency graph to define conflicts, it greedily constructs a large non-conflicting update set (an independent set), enabling additional parallel unmasking while reducing inconsistencies caused by non-independent position predictions.
pip install -r requirements.txtpip install -r requirements-lock.txtWe provide the eval scripts for the main experiment, you can reproduce it directly. For example:
cd llada
bash eval_instruct.shThe main experiment is conducted on an Nvidia H100 80 GPU, DAWN exhibits efficiency across multiple models and benchmarks:
Thank you for citing this work if it helps your research!
@misc{dawn,
title={DAWN: Dependency-Aware Fast Inference for Diffusion LLMs},
author={Lizhuo Luo and Zhuoran Shi and Jiajun Luo and Zhi Wang and Shen Ren and Wenya Wang and Tianwei Zhang},
year={2026},
eprint={2602.06953},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.06953},
}We would like to thank the authors of LLaDA, Dream and Fast-dLLM for their excellent work and open-source contributions.


