Skip to content

SegFault and other errors on instantiating subclass of torch.FloatTensor and torch.Tensor #20052

@bnehoran

Description

@bnehoran

🐛 Bug

Instantiating a subclass of torch.FloatTensor, torch.ByteTensor, or torch.BoolTensor causes a segmentation fault, a c10::Error: Unrecognized Scalartype UNKNOWN_SCALAR, and/or other odd errors.

Instantiating a subclass of torch.Tensor with parameters and overriding __init__ fails with a TypeError.

To Reproduce

Run the following 4 lines on a fresh python REPL (or some variant):

import torch
class MyFloatTensor(torch.FloatTensor):
    pass
MyFloatTensor()

With python2.7 and pytorch 1.1.0, I get
Segmentation fault (core dumped)

With python3.5.2 and pytorch 1.1.0, it's a bit less predictable.

Sometimes I get a RuntimeError:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-57316e947464> in <module>
----> 1 MyFloatTensor()

RuntimeError: Unknown backend

Sometimes, I get a segmentation fault:
Segmentation fault (core dumped)

Sometimes I get a c10::Error:

terminate called after throwing an instance of 'c10::Error'
  what():  Unrecognized Scalartype UNKNOWN_SCALAR (please report this error) (scalarTypeToTypeMeta at /pytorch/c10/core/ScalarType.h:136)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fb12efc9441 in /usr/people/myusername/python3env/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fb12efc8d7a in /usr/people/myusername/python3env/lib/python3.5/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x2fb757 (0x7fb16e493757 in /usr/people/myusername/python3env/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x3caee6 (0x7fb16e562ee6 in /usr/people/myusername/python3env/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #4: torch::utils::legacy_tensor_ctor(at::Type const&, c10::ScalarType, _object*, _object*) + 0x210 (0x7fb16e6c0000 in /usr/people/myusername/python3env/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x510d8b (0x7fb16e6a8d8b in /usr/people/myusername/python3env/lib/python3.5/site-packages/torch/lib/libtorch_python.so)
frame #6: /usr/people/myusername/python3env/bin/python3.5() [0x57efe5]
frame #7: PyObject_Call + 0x47 (0x5c1797 in /usr/people/myusername/python3env/bin/python3.5)
frame #8: PyEval_EvalFrameEx + 0x4ec6 (0x53bba6 in /usr/people/myusername/python3env/bin/python3.5)
frame #9: /usr/people/myusername/python3env/bin/python3.5() [0x540199]
frame #10: PyEval_EvalCode + 0x1f (0x540e4f in /usr/people/myusername/python3env/bin/python3.5)
frame #11: /usr/people/myusername/python3env/bin/python3.5() [0x54a7c5]
frame #12: PyCFunction_Call + 0x4f (0x4e9b7f in /usr/people/myusername/python3env/bin/python3.5)
frame #13: PyEval_EvalFrameEx + 0x614 (0x5372f4 in /usr/people/myusername/python3env/bin/python3.5)
frame #14: _PyGen_Send + 0x133 (0x4ed7d3 in /usr/people/myusername/python3env/bin/python3.5)
frame #15: PyEval_EvalFrameEx + 0x5ce5 (0x53c9c5 in /usr/people/myusername/python3env/bin/python3.5)
frame #16: _PyGen_Send + 0x133 (0x4ed7d3 in /usr/people/myusername/python3env/bin/python3.5)
frame #17: PyEval_EvalFrameEx + 0x5ce5 (0x53c9c5 in /usr/people/myusername/python3env/bin/python3.5)
frame #18: _PyGen_Send + 0x133 (0x4ed7d3 in /usr/people/myusername/python3env/bin/python3.5)
frame #19: PyEval_EvalFrameEx + 0x4ce6 (0x53b9c6 in /usr/people/myusername/python3env/bin/python3.5)
frame #20: PyEval_EvalFrameEx + 0x4b04 (0x53b7e4 in /usr/people/myusername/python3env/bin/python3.5)
frame #21: PyEval_EvalFrameEx + 0x4b04 (0x53b7e4 in /usr/people/myusername/python3env/bin/python3.5)
frame #22: /usr/people/myusername/python3env/bin/python3.5() [0x540199]
frame #23: PyEval_EvalFrameEx + 0x50b2 (0x53bd92 in /usr/people/myusername/python3env/bin/python3.5)
frame #24: /usr/people/myusername/python3env/bin/python3.5() [0x540199]
frame #25: PyEval_EvalFrameEx + 0x50b2 (0x53bd92 in /usr/people/myusername/python3env/bin/python3.5)
frame #26: /usr/people/myusername/python3env/bin/python3.5() [0x540199]
frame #27: PyEval_EvalFrameEx + 0x50b2 (0x53bd92 in /usr/people/myusername/python3env/bin/python3.5)
frame #28: PyEval_EvalFrameEx + 0x4b04 (0x53b7e4 in /usr/people/myusername/python3env/bin/python3.5)
frame #29: PyEval_EvalCodeEx + 0x13b (0x540f9b in /usr/people/myusername/python3env/bin/python3.5)
frame #30: /usr/people/myusername/python3env/bin/python3.5() [0x4ebe37]
frame #31: PyObject_Call + 0x47 (0x5c1797 in /usr/people/myusername/python3env/bin/python3.5)
frame #32: PyEval_EvalFrameEx + 0x252b (0x53920b in /usr/people/myusername/python3env/bin/python3.5)
frame #33: /usr/people/myusername/python3env/bin/python3.5() [0x540199]
frame #34: PyEval_EvalFrameEx + 0x50b2 (0x53bd92 in /usr/people/myusername/python3env/bin/python3.5)
frame #35: /usr/people/myusername/python3env/bin/python3.5() [0x540199]
frame #36: PyEval_EvalCode + 0x1f (0x540e4f in /usr/people/myusername/python3env/bin/python3.5)
frame #37: /usr/people/myusername/python3env/bin/python3.5() [0x60c272]
frame #38: PyRun_FileExFlags + 0x9a (0x60e71a in /usr/people/myusername/python3env/bin/python3.5)
frame #39: PyRun_SimpleFileExFlags + 0x1bc (0x60ef0c in /usr/people/myusername/python3env/bin/python3.5)
frame #40: Py_Main + 0x456 (0x63fb26 in /usr/people/myusername/python3env/bin/python3.5)
frame #41: main + 0xe1 (0x4cfeb1 in /usr/people/myusername/python3env/bin/python3.5)
frame #42: __libc_start_main + 0xf0 (0x7fb1e860b830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #43: _start + 0x29 (0x5d6049 in /usr/people/myusername/python3env/bin/python3.5)

Aborted (core dumped)

And occasionally, it just runs without crashing and outputs

tensor([], dtype=torch.uint8)

which is almost what I would expect except that this is a ByteTensor and not a FloatTensor.

Each time I run it, I get a different one of these errors, with no discernible pattern. Sometimes running it two times in a row gets the same resulting error, sometimes not.

The same behavior occurs with other variants including

  • replacing torch.FloatTensor with another tensor datatype such as torch.ByteTensor and torch.BoolTensor
  • passing parameters to the instantiation.
  • adding a body to the class definition

While I encountered/produced all these errors while running on a single machine (environment specification below), I have tried this on other machines as well and I get similarly erratic behavior, including on Google Colaboratory.


A different issue occurs when instantiating a subclass of torch.Tensor.

Running the following code:

import torch
class MyTensor(torch.Tensor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
MyTensor(2)

produces a TypeError:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c115513591af> in <module>
----> 1 MyTensor(2)

<ipython-input-2-0187aa9fddd0> in __init__(self, *args, **kwargs)
      1 class MyTensor(torch.Tensor):
      2     def __init__(self, *args, **kwargs):
----> 3         super().__init__(*args, **kwargs)

TypeError: object.__init__() takes no parameters

This error occurs for any nonzero number of parameters.
Note that just MyTensor() works fine, as it does when I do not override __init__. But this is overriding __init__ with its superclass's __init__, which should do nothing, and yet it causes it to fail. I understand that the torch.Tensor class is defined in the C backend, and that it only uses the __new__ method to initialize. However, the __init__ method of torch.Tensor should at least swallow its arguments before passing them on to object so that overriding __init__ doesn't have to avoid calling its superclass (which is messy to deal with).

Expected behavior

Running

import torch
class MyFloatTensor(torch.FloatTensor):
    pass
MyFloatTensor()

should instantiate a MyFloatTensor as subclass of a torch.FloatTensor and should output the same as calling

>>> torch.FloatTensor()
tensor([])

Running

import torch
class MyTensor(torch.Tensor):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
MyTensor(2)

should instantiate a 2-element MyTensor as subclass of a torch.Tensor and should output the same as calling

>>> torch.Tensor(2)
tensor([6.5783e-31, 4.5766e-41])

It's possible that pytorch tensor types weren't intended to be extended, which would be quite unfortunate. I would argue that it should be possible to subclass them, and doing so would be quite useful for me in my research. But in any case, attempting to subclass them should never cause a Segmentation Fault, and that usually seems to be indicative of deeper problem.

Environment

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 390.77
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.16.1
[pip3] torch==1.1.0
[pip3] torchfile==0.1.0
[pip3] torchvision==0.2.2.post3
[conda] Could not collect

Related GitHub Issues

#17249 and #17716 are both related to subclassing PyTorch Tensors but don't mention these errors.

Metadata

Metadata

Assignees

Labels

high prioritytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions