Yicun Yang1 *, Cong Wang1, Shaobo Wang1, Zichen Wen1, Biqing Qi2, Hanlin Xu3, Linfeng Zhang1 †
1Shanghai Jiao Tong University, 2Shanghai AI Lab, 3Huawei
- Native Variable-Length Generation: Guided by the [EOS] token, it supports arbitrary length output without fixed hyperparameters.
- High Parallelism: Inherits the bidirectional attention of dLLM, supporting blockwise diffusion inference.
- KV Cache Compatible: Seamlessly reuses the KV cache, avoiding complex designs and improving efficiency.
Figure 1: The evolution of probabilistic modeling paradigms for text generation. From autoregressive (AR) to diffusion-based methods. dLLM-Var achieves variable-length generation while maintaining high parallelism.
- Python 3.12
- PyTorch 2.5+ (H-series GPUs support FP8 mixed precision)
# Clone the repository
git clone https://github.com/maomaocun/dLLM-Var.git
cd dLLM-Var
bash install.shdemo_dLLM-var.pycd ./evaluation
bash run_batch.shYou need to change the environment variables
cd datset
python transfer_text2token.py --input_dir "path/to/your/input/jsonl/folder" --output_file "path/to/your/output/tokenized.jsonl" --tokenizer_model "path/to/your/LLaDA-8B-Base"For detailed dataset format, see ./sft_training/data/dataset.py.
Training uses DeepSpeed ZeRO-2 and supports multi-GPU. Example command:
cd ./sft_training
bash run_gpus_fp8.shFor detailed training configuration, see ./sft_training/config/sft/default_config.yaml.
If you find this work useful, please cite:
@misc{yang2025diffusionllmnativevariable,
title={Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way},
author={Yicun Yang and Cong Wang and Shaobo Wang and Zichen Wen and Biqing Qi and Hanlin Xu and Linfeng Zhang},
year={2025},
eprint={2510.24605},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.24605},
}
MIT License.
- Project Lead: [email protected]
- Corresponding Author: [email protected]
