By Subham Sekhar Sahoo, Jean-Marie Lamercier, Justin Deschenaux, Zhihan Yang, Jingyu Liu, John Thickstun, Ante Jukic
In this repo, we release the state-of-the-art diffusion language models:
- Masked Diffusion Model: MDLM
Sahoo et al., "Simple and Effective Masked Diffusion Language Model", NeurIPS 2024.
- Uniform-state Diffusion Model: Duo
- AR-MDLM interpolating method: Eso-LMs
We pre-train on SlimPajama.
- Preprocess it using TinyLlama's codebase.
- Place the data chunks in your chosen directory and auto_resubmit.sh to that path.
For scaling-law experiments, set:
- Algorithm:
ALGO = ar / mdlm / esolm / duo - Model size:
MODEL = 6M / 19M / ... / 2121M(Full list) - Training flops (
x1e18):FLOPS = 6 / 10 / 30 / 60 / 100
in the following command:
./auto_resubmit.sh -n 5 -m <MODEL> -f <FLOPS> -b 32 -N 1 -t chinchilla-mdlm scripts/<ALGO>/train_slim_mdlm.sh
We use Nvidia's Nemotron-Pre-Training-Dataset for pre-training the models which is now available on HuggingFace.
To train the 1.7B (non-embedding parameters) model, set:
- Algorithm:
ALGO = ar / mdlm / esolm / duo - Phase:
PHASE = 1 / 2in the following command:
./auto_resubmit.sh -n 10 -m 2121M -b 2 -N 16 -D nvidia -p <PHASE> -t ar scripts/<ALGO>/train.sh
1.7B Checkpoints will be released on March 1st, 2026.
This repository was built off of MDLM, DUO, and Eso-LMs.
@misc{sahoo2026scalingmaskeddiffusionlanguage,
title={Scaling Beyond Masked Diffusion Language Models},
author={Subham Sekhar Sahoo and Jean-Marie Lemercier and Zhihan Yang and Justin Deschenaux and Jingyu Liu and John Thickstun and Ante Jukic},
year={2026},
eprint={2602.15014},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.15014},
}
