-
Notifications
You must be signed in to change notification settings - Fork 2.4k
An error occurred in the ASR task using the streaming conformer config on ESPnet2 #3803
Description
- python=3.8.5
- torch=1.10.0+cu102
- torch cuda=10.2
- espnet=0.10.5a1
I want to use streaming conformer config for my dataset of the ASR task, but this error occurs after the first epoch training.
The log is:
[user] 2021-11-17 21:07:07,236 (trainer:668) INFO: 1epoch:train:26767-28253batch: iter_time=7.281e-05, forward_time=0.133, loss=150.549, loss_att=116.794, loss_ctc=229.310, acc=0.325, backward_time=0.169, optim_step_time=0.031, optim0_lr0=4.585e-04, train_time=0.425
[user] 2021-11-17 21:17:35,754 (trainer:668) INFO: 1epoch:train:28254-29740batch: iter_time=2.378e-04, forward_time=0.133, loss=142.284, loss_att=110.287, loss_ctc=216.944, acc=0.327, backward_time=0.166, optim_step_time=0.031, optim0_lr0=4.833e-04, train_time=0.422
Traceback (most recent call last):
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/espnet/espnet2/bin/asr_train.py", line 23, in
main()
File "/home/user/espnet/espnet2/bin/asr_train.py", line 19, in main
ASRTask.main(cmd=cmd)
File "/home/user/espnet/espnet2/tasks/abs_task.py", line 1013, in main
cls.main_worker(args)
File "/home/user/espnet/espnet2/tasks/abs_task.py", line 1305, in main_worker
cls.trainer.run(
File "/home/user/espnet/espnet2/train/trainer.py", line 293, in run
cls.validate_one_epoch(
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/user/espnet/espnet2/train/trainer.py", line 711, in validate_one_epoch
retval = model(**batch)
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/espnet/espnet2/asr/espnet_model.py", line 142, in forward
encoder_out, encoder_out_lens = self.encode(speech, speech_lengths)
File "/home/user/espnet/espnet2/asr/espnet_model.py", line 244, in encode
assert encoder_out.size(1) <= encoder_out_lens.max(), (
AttributeError: 'NoneType' object has no attribute 'max'
[user] 2021-11-17 21:17:44,911 (internal:139) INFO: Internal process exited.
But I can use a smaller data set for training, I would like to ask how can I solve this problem.
The log is:
[user] 2021-11-18 09:07:55,870 (trainer:668) INFO: 1epoch:train:257-272batch: iter_time=7.188e-05, forward_time=0.144, loss=188.445, loss_att=146.988, loss_ctc=285.178, acc=0.257, backward_time=0.172, optim_step_time=0.032, optim0_lr0=4.425e-06, train_time=0.449
[user] 2021-11-18 09:08:02,991 (trainer:668) INFO: 1epoch:train:273-288batch: iter_time=6.664e-05, forward_time=0.142, loss=237.047, loss_att=180.976, loss_ctc=367.880, acc=0.277, backward_time=0.170, optim_step_time=0.032, optim0_lr0=4.692e-06, train_time=0.445
[user] 2021-11-18 09:08:10,122 (trainer:668) INFO: 1epoch:train:289-304batch: iter_time=7.703e-05, forward_time=0.140, loss=278.296, loss_att=215.102, loss_ctc=425.748, acc=0.278, backward_time=0.185, optim_step_time=0.032, optim0_lr0=4.958e-06, train_time=0.445
[user] 2021-11-18 09:08:17,111 (trainer:668) INFO: 1epoch:train:305-320batch: iter_time=7.410e-05, forward_time=0.139, loss=205.749, loss_att=159.653, loss_ctc=313.307, acc=0.271, backward_time=0.170, optim_step_time=0.033, optim0_lr0=5.225e-06, train_time=0.437
[user] 2021-11-18 09:08:24,423 (trainer:668) INFO: 1epoch:train:321-336batch: iter_time=7.448e-05, forward_time=0.146, loss=235.820, loss_att=181.995, loss_ctc=361.411, acc=0.310, backward_time=0.178, optim_step_time=0.032, optim0_lr0=5.492e-06, train_time=0.457
[user] 2021-11-18 09:08:27,867 (trainer:328) INFO: 1epoch results: [train] iter_time=2.738e-04, forward_time=0.141, loss=319.074, loss_att=237.090, loss_ctc=510.369, acc=0.192, backward_time=0.172, optim_step_time=0.032, optim0_lr0=2.842e-06, train_time=0.442, time=2 minutes and 29.56 seconds, total_count=338, gpu_max_cached_mem_GB=9.566, [valid] loss=225.508, loss_att=171.193, loss_ctc=352.243, acc=0.288, cer=0.720, wer=1.284, cer_ctc=1.000, time=2.41 seconds, total_count=12, gpu_max_cached_mem_GB=9.566
[user] 2021-11-18 09:08:28,769 (trainer:375) INFO: The best model has been updated: valid.acc
[user] 2021-11-18 09:08:28,770 (trainer:262) INFO: 2/50epoch started. Estimated time to finish: 2 hours, 4 minutes and 55.87 seconds
[user] 2021-11-18 09:08:36,278 (trainer:668) INFO: 2epoch:train:1-16batch: iter_time=0.004, forward_time=0.148, loss=235.967, loss_att=180.346, loss_ctc=365.749, acc=0.316, backward_time=0.184, optim_step_time=0.033, optim0_lr0=5.792e-06, train_time=0.469
[user] 2021-11-18 09:08:43,533 (trainer:668) INFO: 2epoch:train:17-32batch: iter_time=7.638e-05, forward_time=0.143, loss=189.732, loss_att=145.995, loss_ctc=291.784, acc=0.321, backward_time=0.173, optim_step_time=0.034, optim0_lr0=6.058e-06, train_time=0.453
[user] 2021-11-18 09:08:50,837 (trainer:668) INFO: 2epoch:train:33-48batch: iter_time=7.702e-05, forward_time=0.146, loss=182.177, loss_att=138.910, loss_ctc=283.135, acc=0.321, backward_time=0.179, optim_step_time=0.033, optim0_lr0=6.325e-06, train_time=0.456