NeurIPS 2025 | Next Semantic Scale Prediction via
Hierarchical Diffusion Language Models

¹ Massachusetts Institute of Technology ² Microsoft Research

^* Equal Contribution ^† Equal Senior Supervision

📢 News

[2025/09/18] HDLM is accepted to NeurIPS 2025!
[2025/10/12] Paper is available on arXiv!
[2025/10/12] Code is released!

💻 Overview

We present Hierarchical Diffusion Language Model (HDLM), a novel framework for training discrete diffusion models via time-varying next-semantic scale prediction. HDLM extends standard Masked Diffusion Model (MDM) by introducing intermediate hierarchies (termed cluster tokens) in between clean tokens and masked tokens. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDM as a special case. This repository contains all training and evaluation code necessary for reproducing the results in the paper.

🔧 Quick Start

Set up the environment:

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt && pip install -e .

🎈 Reproducing Experiments

Training

Precomputing cluster dicts and embeddings

You can download our precalculated files in hdlm/clusters for existing numbers of clusters in [1, 2, 4, 8, 16, 32, 64, 128, 256] with GPT-2 tokenizer on OpenWebText dataset using GIDD pretrained models, or preprocess by running hdlm/compute_cluster.py for customed numbers of clusters / tokenizers / datasets / pretrained models. Make sure the names / paths of these cluster files match cluster_dict_path, cluster_embed_path and pretrained_model_name in your training configs as in the examples.

Configs

To reproduce the training runs from the paper, you can use the following commands. In this example, we are training on a single node with 8 GPUs, feel free to adjust the --nnodes and --nproc_per_node arguments to match your setup.

Whenever needed, feel free to change the checkpoint saving directory by adjusting save_dir in hdlm/configs/logging/default.yaml, and data storage directory by cache_dir in hdlm/configs/data/defaults.yaml.

Key hyperparameters include:

cluster_size: number of clusters ($n$ in the paper)
gamma: forward process schedule ($\gamma$ in the paper)
p_perturb: probability of stochastic perturbations ($1-\xi$ in the paper)

You are also welcome to try out other model / training / loss hyperparameters.

(optional) Log into W&B with wandb login for experiment tracking or other disable via wandb disabled.

# HDLM-small-64
torchrun --nnodes 1 --nproc_per_node 8 hdlm/train.py --config-name hdlm-small-cluster_64-gamma_1.0-xi_1.0 logging.run_name="'small-hdlm-cluster_64-gamma_1.0-xi_1.0-owt'"

# GIDD+ baseline
torchrun --nnodes 1 --nproc_per_node 8 hdlm/train.py --config-name gidd logging.run_name="'small-gidd+-owt-pu=0.0'"

# MDLM baseline
torchrun --nnodes 1 --nproc_per_node 8 hdlm/train.py --config-name mdlm logging.run_name="'small-mdlm-owt'"

# AR baseline
torchrun --nnodes 1 --nproc_per_node 8 hdlm/train.py --config-name ar logging.run_name="'small-ar-owt'"

Evaluation

There are also a couple of scripts to run inference and evaluate the trained models.

Generate samples

The following command will generate num_samples=256 samples in num_denoising_steps=512 iterations from the model checkpoint located at path and save them to samples_dir=samples.pt.

python hdlm/eval/generate_samples.py path=./outputs/path/to/checkpoint/ samples_dir=samples.pt num_samples=256 num_denoising_steps=512 batch_size=16

Generative PPL

Given a file containing samples generated with the generate_samples.py script, the following command will compute the generative PPL. Here we assume that the diffusion model used to generate samples located at samples.pt uses the gpt2 tokenizer, and we compute generative PPL using gpt2-large as a reference model. The results will be saved to metrics_path=metrics.json.

python hdlm/eval/generative_ppl.py samples_path=samples.pt model_tokenizer=gpt2 pretrained_model=gpt2-large batch_size=1 metrics_path=metrics.json

Validation loss

A simple helper script to compute the loss of a trained model on the entire validation split.

python hdlm/eval/loss.py path=./outputs/path/to/checkpoint/ batch_size=32

📎 Citation

If you find our work helpful, please consider giving a star ⭐ and citation 📝

@article{zhou2025next,
  title={Next Semantic Scale Prediction via Hierarchical Diffusion Language Models},
  author={Zhou, Cai and Wang, Chenyu and Zhang, Dinghuai and Tong, Shangyuan and Wang, Yifei and Bates, Stephen and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2510.08632},
  year={2025}
}

💞 Acknowledgements

The code is built upon the below repositories, we thank all the contributors for open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
hdlm		hdlm
HDLM.png		HDLM.png
LICENSE		LICENSE
README.md		README.md
convert_checkpoint.py		convert_checkpoint.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeurIPS 2025 | Next Semantic Scale Prediction via
Hierarchical Diffusion Language Models

📢 News

💻 Overview

🔧 Quick Start

🎈 Reproducing Experiments

Training

Precomputing cluster dicts and embeddings

Configs

Evaluation

Generate samples

Generative PPL

Validation loss

📎 Citation

💞 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

NeurIPS 2025 | Next Semantic Scale Prediction viaHierarchical Diffusion Language Models

📢 News

💻 Overview

🔧 Quick Start

🎈 Reproducing Experiments

Training

Precomputing cluster dicts and embeddings

Configs

Evaluation

Generate samples

Generative PPL

Validation loss

📎 Citation

💞 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

NeurIPS 2025 | Next Semantic Scale Prediction via
Hierarchical Diffusion Language Models

Packages