Skip to content

s-sahoo/scaling-dllms

Repository files navigation

By Subham Sekhar Sahoo, Jean-Marie Lamercier, Justin Deschenaux, Zhihan Yang, Jingyu Liu, John Thickstun, Ante Jukic

deploy arXiv

Update: 1.7B Checkpoints will be released on March 1st, 2026.

graphical_abstract_updated_2

In this repo, we release the state-of-the-art diffusion language models:

  1. Masked Diffusion Model: MDLM

    Sahoo et al., "Simple and Effective Masked Diffusion Language Model", NeurIPS 2024.

  2. Uniform-state Diffusion Model: Duo

    Sahoo et al., "The Diffusion Duality", ICML 2025.

  3. AR-MDLM interpolating method: Eso-LMs

    Sahoo et al., "Esoteric Language Models", arXiv 2025.

Scaling Laws

Dataset

We pre-train on SlimPajama.

  1. Preprocess it using TinyLlama's codebase.
  2. Place the data chunks in your chosen directory and auto_resubmit.sh to that path.

Training

For scaling-law experiments, set:

  • Algorithm: ALGO = ar / mdlm / esolm / duo
  • Model size: MODEL = 6M / 19M / ... / 2121M (Full list)
  • Training flops (x1e18): FLOPS = 6 / 10 / 30 / 60 / 100

in the following command:

./auto_resubmit.sh -n 5 -m <MODEL> -f <FLOPS> -b 32 -N 1 -t chinchilla-mdlm scripts/<ALGO>/train_slim_mdlm.sh 

1.7B Models

Dataset

We use Nvidia's Nemotron-Pre-Training-Dataset for pre-training the models which is now available on HuggingFace.

Training

To train the 1.7B (non-embedding parameters) model, set:

  1. Algorithm: ALGO = ar / mdlm / esolm / duo
  2. Phase: PHASE = 1 / 2 in the following command:
./auto_resubmit.sh -n 10  -m 2121M -b 2 -N 16 -D nvidia -p <PHASE> -t ar scripts/<ALGO>/train.sh 

Evaluation

1.7B Checkpoints will be released on March 1st, 2026.

Acknowledgements

This repository was built off of MDLM, DUO, and Eso-LMs.

Citation

@misc{sahoo2026scalingmaskeddiffusionlanguage,
      title={Scaling Beyond Masked Diffusion Language Models}, 
      author={Subham Sekhar Sahoo and Jean-Marie Lemercier and Zhihan Yang and Justin Deschenaux and Jingyu Liu and John Thickstun and Ante Jukic},
      year={2026},
      eprint={2602.15014},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.15014}, 
}

About

Scaling Beyond Masked Diffusion Language Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors