This directory contains example scripts for Ling 2.0 MoE language models by inclusionAI.
Ling 2.0 uses a high-sparsity Mixture of Experts (MoE) architecture with sigmoid routing, QK-Norm, and Half RoPE.
| Model | HF ID | Architecture | Params | Active Params |
|---|---|---|---|---|
| Ling-flash-2.0 | inclusionAI/Ling-flash-2.0 |
MoE (256 experts, top-8) | 100B | 6.1B |
| Ling-flash-base-2.0 | inclusionAI/Ling-flash-base-2.0 |
MoE (256 experts, top-8) | 100B | 6.1B |
| Ling-mini-2.0 | inclusionAI/Ling-mini-2.0 |
MoE (256 experts, top-8) | 16B | 1.5B |
| Ling-mini-base-2.0 | inclusionAI/Ling-mini-base-2.0 |
MoE (256 experts, top-8) | 16B | 1.5B |
All scripts use a WORKSPACE environment variable for the base directory. Default: /workspace.
export WORKSPACE=/your/custom/pathDirectory structure:
${WORKSPACE}/models/- Converted checkpoints${WORKSPACE}/results/- Training outputs
See conversion.sh for checkpoint conversion examples.
python examples/conversion/convert_checkpoints.py import \
--hf-model inclusionAI/Ling-flash-2.0 \
--megatron-path ${WORKSPACE}/models/Ling-flash-2.0 \
--trust-remote-codepython examples/conversion/convert_checkpoints.py export \
--hf-model inclusionAI/Ling-flash-2.0 \
--megatron-path ${WORKSPACE}/models/Ling-flash-2.0/iter_0000000 \
--hf-path ${WORKSPACE}/models/Ling-flash-2.0-hf-export \
--trust-remote-codepython -m torch.distributed.run --nproc_per_node=8 \
examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
--hf-model-id inclusionAI/Ling-flash-2.0 \
--megatron-load-path ${WORKSPACE}/models/Ling-flash-2.0/iter_0000000 \
--tp 1 --ep 8 \
--trust-remote-codeSee inference.sh for text generation with:
- Hugging Face checkpoint (
inclusionAI/Ling-flash-2.0) - Imported Megatron checkpoint (after conversion.sh import)
- Exported HF checkpoint (after conversion export)
The default parallelism for 8 GPUs is --tp 2 --ep 4.
TP×PP×EP must equal --nproc_per_node.
Note:
--tp 1 --ep 8works for conversion round-trip but may cause issues during autoregressive inference with single-token batches (empty token dispatch to some EP ranks). Use--tp 2 --ep 4for inference.
Note: All Ling 2.0 models use custom HuggingFace code, so
--trust-remote-codeis required for conversion and inference.