Skip to content

nn.CTCLoss RuntimeError on GPU #20522

@ypw-rich

Description

@ypw-rich

🐛 Bug

nn.CTCLoss Run the official documentation sample code works fine on the CPU but RuntimeError on the GPU.

To Reproduce

import torch
import torch.nn as nn
import torch.nn.functional as F

>>> T = 255      # Input sequence length
>>> C = 20      # Number of classes (excluding blank)
>>> N = 16      # Batch size
>>> S = 30      # Target sequence length of longest target in batch
>>> S_min = 10  # Minimum target length, for demonstration purposes
>>>
>>> # Initialize random batch of input vectors, for *size = (T,N,C)
>>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
>>>
>>> # Initialize random batch of targets (0 = blank, 1:C+1 = classes)
>>> target = torch.randint(low=1, high=C+1, size=(N, S), dtype=torch.long)
>>>
>>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
>>> target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)

input = input.cuda()
target = target.cuda()
input_lengths = input_lengths.cuda()
target_lengths = target_lengths.cuda()

>>> ctc_loss = nn.CTCLoss()
>>> loss = ctc_loss(input, target, input_lengths, target_lengths)
>>> loss.backward()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-24c5c245c677> in <module>
      1 ctc_loss = nn.CTCLoss()
      2 loss = ctc_loss(input, target, input_lengths, target_lengths)
----> 3 loss.backward()

~/anaconda3/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    105                 products. Defaults to ``False``.
    106         """
--> 107         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    108 
    109     def register_hook(self, hook):

~/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: setStorage: sizes [16, 255, 31], strides [15045, 59, 2], and storage offset 0 requiring a storage size of 240722 are out of bounds for storage with numel 240720

See https://gist.github.com/ypw-rich/faafe22d108c4fe24f128ecb4aaabda3

Expected behavior

Expect same grad on CPU.

Environment

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 18.04.2 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 418.56
cuDNN version: /data/ocr/bin/libcudnn.so.7

Versions of relevant libraries:
[pip3] numpy==1.16.2
[pip3] numpydoc==0.8.0
[pip3] torch==1.1.0
[pip3] torchvision==0.2.2.post3
[conda] blas 1.0 mkl
[conda] mkl 2019.3 199
[conda] mkl-service 1.1.2 py37he904b0f_5
[conda] mkl_fft 1.0.10 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0
[conda] torch 1.1.0 pypi_0 pypi
[conda] torchvision 0.2.2.post3 pypi_0 pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: lossProblem is related to loss functionmodule: nnRelated to torch.nntriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions