-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Description
bug描述 Describe the Bug
复现环境:cuda11.7 python3.10 v100-32g 单机八卡
paddle commit:3bcdeef55611b66f49fca4b68bd99daf7e44b40b
git clone http://github.com/PaddlePaddle/PaddleNLP.git -b develop && cd PaddleNLP/model_zoo/gpt-3/
数据&环境准备
python -m pip install -r requirements.txt
mkdir data
wget -O data/gpt_en_dataset_300m_ids.npy https://bj.bcebos.com/paddlenlp/models/transformers/gpt/data/gpt_en_dataset_300m_ids.npy
wget -O data/gpt_en_dataset_300m_idx.npz https://bj.bcebos.com/paddlenlp/models/transformers/gpt/data/gpt_en_dataset_300m_idx.npz
执行命令
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7;
gpt_recompute_bs16_fp16_DP2-MP2-PP2配置在2.5w+ step开始出nan
python -m paddle.distributed.launch --log_dir=./mylog --devices=0,1,2,3,4,5,6,7 tools/train.py -c ppfleetx/configs/nlp/gpt/pretrain_gpt_1.3B_dp8.yaml -o Global.seed=1234 -o Global.local_batch_size=8 -o Global.micro_batch_size=2 -o Engine.max_steps=50000 -o Engine.eval_freq=1000 -o Engine.mix_precision.enable=True -o Engine.save_load.save_steps=100000 -o Model.hidden_size=1024 -o Model.num_layers=4 -o Model.num_attention_heads=4 -o Model.type_vocab_size=1 -o Model.use_recompute=True -o Distributed.dp_degree=2 -o Distributed.mp_degree=2 -o Distributed.pp_degree=2 -o Distributed.sharding.sharding_degree=1 -o Distributed.sharding.sharding_stage=1 -o Distributed.sharding.sharding_offload=False -o Profiler_pretrain.memory_stats=True -o Optimizer.lr.max_lr=1e-4 -o Optimizer.lr.min_lr=1e-5
gpt_bs64_fp16_DP8-MP1-PP1配置在1.7w+ step开始出nan
python -m paddle.distributed.launch --log_dir=./mylog --devices=0,1,2,3,4,5,6,7 tools/train.py -c ppfleetx/configs/nlp/gpt/pretrain_gpt_1.3B_dp8.yaml -o Global.seed=1234 -o Global.local_batch_size=8 -o Global.micro_batch_size=8 -o Engine.max_steps=50000 -o Engine.eval_freq=1000 -o Engine.mix_precision.enable=True -o Engine.save_load.save_steps=100000 -o Model.hidden_size=1024 -o Model.num_layers=4 -o Model.num_attention_heads=4 -o Model.type_vocab_size=1 -o Model.use_recompute=True -o Distributed.dp_degree=8 -o Distributed.mp_degree=1 -o Distributed.pp_degree=1 -o Distributed.sharding.sharding_degree=1 -o Distributed.sharding.sharding_stage=1 -o Distributed.sharding.sharding_offload=False -o Profiler_pretrain.memory_stats=True -o Optimizer.lr.max_lr=1e-4 -o Optimizer.lr.min_lr=1e-5
其他补充信息 Additional Supplementary Information
No response
