gpt动态图混合并行case执行2w+step后loss出nan

### bug描述 Describe the Bug

复现环境：cuda11.7 python3.10 v100-32g 单机八卡
paddle commit：3bcdeef55611b66f49fca4b68bd99daf7e44b40b
git clone http://github.com/PaddlePaddle/PaddleNLP.git -b develop && cd PaddleNLP/model_zoo/gpt-3/
数据&环境准备
python -m pip install -r requirements.txt
mkdir data
wget -O data/gpt_en_dataset_300m_ids.npy https://bj.bcebos.com/paddlenlp/models/transformers/gpt/data/gpt_en_dataset_300m_ids.npy
wget -O data/gpt_en_dataset_300m_idx.npz https://bj.bcebos.com/paddlenlp/models/transformers/gpt/data/gpt_en_dataset_300m_idx.npz
执行命令
export  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7;
#### gpt_recompute_bs16_fp16_DP2-MP2-PP2配置在2.5w+ step开始出nan
python -m paddle.distributed.launch --log_dir=./mylog --devices=0,1,2,3,4,5,6,7     tools/train.py -c ppfleetx/configs/nlp/gpt/pretrain_gpt_1.3B_dp8.yaml             -o Global.seed=1234                -o Global.local_batch_size=8                -o Global.micro_batch_size=2                -o Engine.max_steps=50000                -o Engine.eval_freq=1000                -o Engine.mix_precision.enable=True                -o Engine.save_load.save_steps=100000                -o Model.hidden_size=1024                -o Model.num_layers=4                -o Model.num_attention_heads=4                -o Model.type_vocab_size=1                -o Model.use_recompute=True                -o Distributed.dp_degree=2                -o Distributed.mp_degree=2                -o Distributed.pp_degree=2                -o Distributed.sharding.sharding_degree=1                -o Distributed.sharding.sharding_stage=1                -o Distributed.sharding.sharding_offload=False                -o Profiler_pretrain.memory_stats=True                -o Optimizer.lr.max_lr=1e-4                -o Optimizer.lr.min_lr=1e-5  

#### gpt_bs64_fp16_DP8-MP1-PP1配置在1.7w+ step开始出nan
python -m paddle.distributed.launch --log_dir=./mylog --devices=0,1,2,3,4,5,6,7          tools/train.py -c ppfleetx/configs/nlp/gpt/pretrain_gpt_1.3B_dp8.yaml             -o Global.seed=1234                -o Global.local_batch_size=8                -o Global.micro_batch_size=8                -o Engine.max_steps=50000                -o Engine.eval_freq=1000                -o Engine.mix_precision.enable=True                -o Engine.save_load.save_steps=100000                -o Model.hidden_size=1024                -o Model.num_layers=4                -o Model.num_attention_heads=4                -o Model.type_vocab_size=1                -o Model.use_recompute=True                -o Distributed.dp_degree=8                -o Distributed.mp_degree=1                -o Distributed.pp_degree=1                -o Distributed.sharding.sharding_degree=1                -o Distributed.sharding.sharding_stage=1                -o Distributed.sharding.sharding_offload=False                -o Profiler_pretrain.memory_stats=True                -o Optimizer.lr.max_lr=1e-4                -o Optimizer.lr.min_lr=1e-5  

问题现象
训练过程中精度出nan，如图
![image](https://github.com/PaddlePaddle/Paddle/assets/44688141/eeca3cd6-8c64-480d-9b2d-4a510a8ff860)


### 其他补充信息 Additional Supplementary Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpt动态图混合并行case执行2w+step后loss出nan #60142

bug描述 Describe the Bug

gpt_recompute_bs16_fp16_DP2-MP2-PP2配置在2.5w+ step开始出nan

gpt_bs64_fp16_DP8-MP1-PP1配置在1.7w+ step开始出nan

其他补充信息 Additional Supplementary Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gpt动态图混合并行case执行2w+step后loss出nan #60142

Description

bug描述 Describe the Bug

gpt_recompute_bs16_fp16_DP2-MP2-PP2配置在2.5w+ step开始出nan

gpt_bs64_fp16_DP8-MP1-PP1配置在1.7w+ step开始出nan

其他补充信息 Additional Supplementary Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions