Scaling Beyond Masked Diffusion Language Models

By Subham Sekhar Sahoo, Jean-Marie Lamercier, Justin Deschenaux, Zhihan Yang, Jingyu Liu, John Thickstun, Ante Jukic

Update: 1.7B Checkpoints will be released on March 1st, 2026.

In this repo, we release the state-of-the-art diffusion language models:

Masked Diffusion Model: MDLM

Sahoo et al., "Simple and Effective Masked Diffusion Language Model", NeurIPS 2024.
Uniform-state Diffusion Model: Duo

Sahoo et al., "The Diffusion Duality", ICML 2025.
AR-MDLM interpolating method: Eso-LMs

Sahoo et al., "Esoteric Language Models", arXiv 2025.

Scaling Laws

Dataset

We pre-train on SlimPajama.

Preprocess it using TinyLlama's codebase.
Place the data chunks in your chosen directory and auto_resubmit.sh to that path.

Training

For scaling-law experiments, set:

Algorithm: ALGO = ar / mdlm / esolm / duo
Model size: MODEL = 6M / 19M / ... / 2121M (Full list)
Training flops (x1e18): FLOPS = 6 / 10 / 30 / 60 / 100

in the following command:

./auto_resubmit.sh -n 5 -m <MODEL> -f <FLOPS> -b 32 -N 1 -t chinchilla-mdlm scripts/<ALGO>/train_slim_mdlm.sh

1.7B Models

Dataset

We use Nvidia's Nemotron-Pre-Training-Dataset for pre-training the models which is now available on HuggingFace.

Training

To train the 1.7B (non-embedding parameters) model, set:

Algorithm: ALGO = ar / mdlm / esolm / duo
Phase: PHASE = 1 / 2 in the following command:

./auto_resubmit.sh -n 10  -m 2121M -b 2 -N 16 -D nvidia -p <PHASE> -t ar scripts/<ALGO>/train.sh

Evaluation

1.7B Checkpoints will be released on March 1st, 2026.

Acknowledgements

This repository was built off of MDLM, DUO, and Eso-LMs.

Citation

@misc{sahoo2026scalingmaskeddiffusionlanguage,
      title={Scaling Beyond Masked Diffusion Language Models}, 
      author={Subham Sekhar Sahoo and Jean-Marie Lemercier and Zhihan Yang and Justin Deschenaux and Jingyu Liu and John Thickstun and Ante Jukic},
      year={2026},
      eprint={2602.15014},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.15014}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
models		models
scripts		scripts
tokenizers		tokenizers
LICENSE		LICENSE
README.md		README.md
algo.py		algo.py
auto_resubmit.sh		auto_resubmit.sh
dataloader.py		dataloader.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
trainer_base.py		trainer_base.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scaling Beyond Masked Diffusion Language Models

Update: 1.7B Checkpoints will be released on March 1st, 2026.

Scaling Laws

Dataset

Training

1.7B Models

Dataset

Training

Evaluation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scaling Beyond Masked Diffusion Language Models

Update: 1.7B Checkpoints will be released on March 1st, 2026.

Scaling Laws

Dataset

Training

1.7B Models

Dataset

Training

Evaluation

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages