Skip to content

An error occurred in the ASR task using the streaming conformer config on ESPnet2 #3803

@lin-nerd

Description

@lin-nerd
  • python=3.8.5
  • torch=1.10.0+cu102
  • torch cuda=10.2
  • espnet=0.10.5a1

I want to use streaming conformer config for my dataset of the ASR task, but this error occurs after the first epoch training.
The log is:
[user] 2021-11-17 21:07:07,236 (trainer:668) INFO: 1epoch:train:26767-28253batch: iter_time=7.281e-05, forward_time=0.133, loss=150.549, loss_att=116.794, loss_ctc=229.310, acc=0.325, backward_time=0.169, optim_step_time=0.031, optim0_lr0=4.585e-04, train_time=0.425
[user] 2021-11-17 21:17:35,754 (trainer:668) INFO: 1epoch:train:28254-29740batch: iter_time=2.378e-04, forward_time=0.133, loss=142.284, loss_att=110.287, loss_ctc=216.944, acc=0.327, backward_time=0.166, optim_step_time=0.031, optim0_lr0=4.833e-04, train_time=0.422
Traceback (most recent call last):
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/espnet/espnet2/bin/asr_train.py", line 23, in
main()
File "/home/user/espnet/espnet2/bin/asr_train.py", line 19, in main
ASRTask.main(cmd=cmd)
File "/home/user/espnet/espnet2/tasks/abs_task.py", line 1013, in main
cls.main_worker(args)
File "/home/user/espnet/espnet2/tasks/abs_task.py", line 1305, in main_worker
cls.trainer.run(
File "/home/user/espnet/espnet2/train/trainer.py", line 293, in run
cls.validate_one_epoch(
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/user/espnet/espnet2/train/trainer.py", line 711, in validate_one_epoch
retval = model(**batch)
File "/home/user/espnet/tools/anaconda/envs/espnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/espnet/espnet2/asr/espnet_model.py", line 142, in forward
encoder_out, encoder_out_lens = self.encode(speech, speech_lengths)
File "/home/user/espnet/espnet2/asr/espnet_model.py", line 244, in encode
assert encoder_out.size(1) <= encoder_out_lens.max(), (
AttributeError: 'NoneType' object has no attribute 'max'
[user] 2021-11-17 21:17:44,911 (internal:139) INFO: Internal process exited.

But I can use a smaller data set for training, I would like to ask how can I solve this problem.
The log is:
[user] 2021-11-18 09:07:55,870 (trainer:668) INFO: 1epoch:train:257-272batch: iter_time=7.188e-05, forward_time=0.144, loss=188.445, loss_att=146.988, loss_ctc=285.178, acc=0.257, backward_time=0.172, optim_step_time=0.032, optim0_lr0=4.425e-06, train_time=0.449
[user] 2021-11-18 09:08:02,991 (trainer:668) INFO: 1epoch:train:273-288batch: iter_time=6.664e-05, forward_time=0.142, loss=237.047, loss_att=180.976, loss_ctc=367.880, acc=0.277, backward_time=0.170, optim_step_time=0.032, optim0_lr0=4.692e-06, train_time=0.445
[user] 2021-11-18 09:08:10,122 (trainer:668) INFO: 1epoch:train:289-304batch: iter_time=7.703e-05, forward_time=0.140, loss=278.296, loss_att=215.102, loss_ctc=425.748, acc=0.278, backward_time=0.185, optim_step_time=0.032, optim0_lr0=4.958e-06, train_time=0.445
[user] 2021-11-18 09:08:17,111 (trainer:668) INFO: 1epoch:train:305-320batch: iter_time=7.410e-05, forward_time=0.139, loss=205.749, loss_att=159.653, loss_ctc=313.307, acc=0.271, backward_time=0.170, optim_step_time=0.033, optim0_lr0=5.225e-06, train_time=0.437
[user] 2021-11-18 09:08:24,423 (trainer:668) INFO: 1epoch:train:321-336batch: iter_time=7.448e-05, forward_time=0.146, loss=235.820, loss_att=181.995, loss_ctc=361.411, acc=0.310, backward_time=0.178, optim_step_time=0.032, optim0_lr0=5.492e-06, train_time=0.457
[user] 2021-11-18 09:08:27,867 (trainer:328) INFO: 1epoch results: [train] iter_time=2.738e-04, forward_time=0.141, loss=319.074, loss_att=237.090, loss_ctc=510.369, acc=0.192, backward_time=0.172, optim_step_time=0.032, optim0_lr0=2.842e-06, train_time=0.442, time=2 minutes and 29.56 seconds, total_count=338, gpu_max_cached_mem_GB=9.566, [valid] loss=225.508, loss_att=171.193, loss_ctc=352.243, acc=0.288, cer=0.720, wer=1.284, cer_ctc=1.000, time=2.41 seconds, total_count=12, gpu_max_cached_mem_GB=9.566
[user] 2021-11-18 09:08:28,769 (trainer:375) INFO: The best model has been updated: valid.acc
[user] 2021-11-18 09:08:28,770 (trainer:262) INFO: 2/50epoch started. Estimated time to finish: 2 hours, 4 minutes and 55.87 seconds
[user] 2021-11-18 09:08:36,278 (trainer:668) INFO: 2epoch:train:1-16batch: iter_time=0.004, forward_time=0.148, loss=235.967, loss_att=180.346, loss_ctc=365.749, acc=0.316, backward_time=0.184, optim_step_time=0.033, optim0_lr0=5.792e-06, train_time=0.469
[user] 2021-11-18 09:08:43,533 (trainer:668) INFO: 2epoch:train:17-32batch: iter_time=7.638e-05, forward_time=0.143, loss=189.732, loss_att=145.995, loss_ctc=291.784, acc=0.321, backward_time=0.173, optim_step_time=0.034, optim0_lr0=6.058e-06, train_time=0.453
[user] 2021-11-18 09:08:50,837 (trainer:668) INFO: 2epoch:train:33-48batch: iter_time=7.702e-05, forward_time=0.146, loss=182.177, loss_att=138.910, loss_ctc=283.135, acc=0.321, backward_time=0.179, optim_step_time=0.033, optim0_lr0=6.325e-06, train_time=0.456

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bugbug should be fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions