Skip to content

maomaocun/dLLM-Var

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dLLM-Var: Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way

arXiv Hugging Face License

Yicun Yang1 *, Cong Wang1, Shaobo Wang1, Zichen Wen1, Biqing Qi2, Hanlin Xu3, Linfeng Zhang1 †

1Shanghai Jiao Tong University, 2Shanghai AI Lab, 3Huawei

Key Features

  • Native Variable-Length Generation: Guided by the [EOS] token, it supports arbitrary length output without fixed hyperparameters.
  • High Parallelism: Inherits the bidirectional attention of dLLM, supporting blockwise diffusion inference.
  • KV Cache Compatible: Seamlessly reuses the KV cache, avoiding complex designs and improving efficiency.
Overview

Figure 1: The evolution of probabilistic modeling paradigms for text generation. From autoregressive (AR) to diffusion-based methods. dLLM-Var achieves variable-length generation while maintaining high parallelism.

Installation

Requirements

  • Python 3.12
  • PyTorch 2.5+ (H-series GPUs support FP8 mixed precision)

Quick Installation

# Clone the repository
git clone https://github.com/maomaocun/dLLM-Var.git
cd dLLM-Var
bash install.sh

Quick Start

Demo

demo_dLLM-var.py

Evaluation

cd ./evaluation
bash run_batch.sh

You need to change the environment variables

Prepare Dataset

cd datset
python transfer_text2token.py --input_dir "path/to/your/input/jsonl/folder" --output_file "path/to/your/output/tokenized.jsonl" --tokenizer_model "path/to/your/LLaDA-8B-Base"

For detailed dataset format, see ./sft_training/data/dataset.py.

Training Script

Training uses DeepSpeed ZeRO-2 and supports multi-GPU. Example command:

cd ./sft_training
bash run_gpus_fp8.sh

For detailed training configuration, see ./sft_training/config/sft/default_config.yaml.

Citation

If you find this work useful, please cite:

@misc{yang2025diffusionllmnativevariable,
      title={Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way}, 
      author={Yicun Yang and Cong Wang and Shaobo Wang and Zichen Wen and Biqing Qi and Hanlin Xu and Linfeng Zhang},
      year={2025},
      eprint={2510.24605},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.24605}, 
}

License

MIT License.

Contact

About

The official implementation of dLLM-Var

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors