Jinjie Ni and the team
[2025-10-27] We release the codebase, all training checkpoints, and logs. The codebase is highly optimized and is industry-level in terms scalability and efficiency.
[2025-10-03] The blog is out! Check it out here!
The codebase is released here. It is a highly-optimized codebase for any-scale DLMs training backend with Megatron-LM.
The full MoE implementation is not yet released. We plan to release it after the main training is done.
We opensource all model checkpoints and training logs mentioned in the paper. All of them can be downloaded at https://huggingface.co/collections/jinjieni/mdga.
The easiest way to download a folder is using this script (setup the variables properly):
python utils/hf_download_folder.py
Alternatively, you can also use wget to directly download individual files from the folder, e.g.:
wget https://huggingface.co/datasets/MDGA-1/openmoe2_logs/blob/main/dense_vs_moe/dense_100b_1e_1b7_difflm/tensorboard/events.out.tfevents.1755443508.0648415733We link the related resources below:
- Parameter–Compute Trade-off
- Diffusion + MoE is a Double Win
- Token-choice vs. Expert-choice
- Token-Wise Load-Balancing
- with and without shared experts
- scaling the expert granularities
- scratch vs. upcycling
- Skip the First 2 MoE Layers
- with and without scaling factors
- Batch vs. Sequence Level Expert Choice
- Softmax vs. Sigmoid
You can refer to this script to inference with the huggingface checkpoints. Due to the large amount, most small checkpoints above are still in megatron formats. You may refer to this script to convert them (need to tweak the conversion scripts).
This is an on-going project! We will tick the below todo list one-by-one.
- Architectural Design Choices
- Scaled-up Pre-training
- Post-training
- Routing & Other Analysis
- Full Paper
- Code & Checkpoint Open-sourcing
@misc{ni2025openmoe2,
title={OpenMoE 2: Sparse Diffusion Language Models},
author={Ni, Jinjie and team},
year={2025}
howpublished={\url{https://github.com/JinjieNi/OpenMoE2}},
}
