Skip to content

JinjieNi/OpenMoE2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OpenMoE 2: Sparse Diffusion Language Models

Jinjie Ni and the team


The first-ever sparse diffusion large language model trained from scratch, focusing on architectural insights.

Static Badge Static Badge Static Badge Twitter

News

[2025-10-27] We release the codebase, all training checkpoints, and logs. The codebase is highly optimized and is industry-level in terms scalability and efficiency.

[2025-10-03] The blog is out! Check it out here!


Code

The codebase is released here. It is a highly-optimized codebase for any-scale DLMs training backend with Megatron-LM.

The full MoE implementation is not yet released. We plan to release it after the main training is done.


Resources

We opensource all model checkpoints and training logs mentioned in the paper. All of them can be downloaded at https://huggingface.co/collections/jinjieni/mdga.

The easiest way to download a folder is using this script (setup the variables properly):

python utils/hf_download_folder.py

Alternatively, you can also use wget to directly download individual files from the folder, e.g.:

wget https://huggingface.co/datasets/MDGA-1/openmoe2_logs/blob/main/dense_vs_moe/dense_100b_1e_1b7_difflm/tensorboard/events.out.tfevents.1755443508.0648415733

We link the related resources below:

You can refer to this script to inference with the huggingface checkpoints. Due to the large amount, most small checkpoints above are still in megatron formats. You may refer to this script to convert them (need to tweak the conversion scripts).


Todo List

This is an on-going project! We will tick the below todo list one-by-one.

  • Architectural Design Choices
  • Scaled-up Pre-training
  • Post-training
  • Routing & Other Analysis
  • Full Paper
  • Code & Checkpoint Open-sourcing

Citation

@misc{ni2025openmoe2,
  title={OpenMoE 2: Sparse Diffusion Language Models},
  author={Ni, Jinjie and team},
  year={2025}
  howpublished={\url{https://github.com/JinjieNi/OpenMoE2}},
}

About

The official repo for "OpenMoE 2: Sparse Diffusion Language Models".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages