Skip to content

Conversation

@beam2d
Copy link
Contributor

@beam2d beam2d commented Aug 21, 2019

When converting a contiguous CuPy ndarray to Tensor via __cuda_array_interface__, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes torch.tensor(cupy_ndarray) works for contiguous inputs.

@pytorchbot pytorchbot added module: internals Related to internal abstractions in c10 and ATen module: numpy Related to numpy support, and also numpy compatibility of our operators labels Aug 21, 2019
@soumith soumith requested a review from gchanan August 23, 2019 03:59
@gchanan
Copy link
Contributor

gchanan commented Aug 23, 2019

can you provide a reproduction?

@beam2d
Copy link
Contributor Author

beam2d commented Aug 28, 2019

OK, here is simple code to reproduce the error.

import cupy
import torch

a = cupy.ones(3)
torch.tensor(a)

This script gives the following error.

Traceback (most recent call last):
  File "foo.py", line 5, in <module>
    torch.tensor(a)
ValueError: given array strides not a multiple of the element byte size. Make a copy of the array to reallocate the memory.

@leofang
Copy link
Contributor

leofang commented Oct 1, 2019

@beam2d @gchanan @madsbk @asford The updated __cuda_array_interface__ protocol (v2) numba/numba#4609 is relevant to this PR.

@madsbk
Copy link
Contributor

madsbk commented Oct 2, 2019

@beam2d @gchanan @madsbk @asford The updated __cuda_array_interface__ protocol (v2) numba/numba#4609 is relevant to this PR.

I don't see any problems with the new protocol but we should throw a more descriptive error message if data is zero: https://github.com/pytorch/pytorch/blob/b93fa6b970cc2e12cfeb8cd1a51dce2d95d797de/torch/csrc/utils/tensor_numpy.cpp#L273

However, the Numba integration test might fail because of version mismatch. I suggest that we remove or change line:

self.assertEqual(ar_dict["version"], 1)

Finally, there is a typo here: https://github.com/pytorch/pytorch/blob/b93fa6b970cc2e12cfeb8cd1a51dce2d95d797de/torch/csrc/utils/tensor_numpy.cpp#L266
Should say:

throw TypeError("attribute `data` must exist"); 

@leofang
Copy link
Contributor

leofang commented Oct 2, 2019

I don't see any problems with the new protocol

@madsbk Note that in the new protocol, a non-existing keyword 'stride' or setting 'stride' to None is used to indicate the array is C-contiguous. So, this line should be changed:

strides = tuple(s * itemsize for s in self.stride())

which currently always spells out the strides regardless of contiguity.

@madsbk
Copy link
Contributor

madsbk commented Oct 2, 2019

@leofang right!
@beam2d can you include the protocol changes in this PR? or do you want me to make a new PR?

@beam2d
Copy link
Contributor Author

beam2d commented Oct 3, 2019

Let me try to make it follow the new protocol.

@beam2d beam2d force-pushed the fix-cuda_array_interface-without-strides branch from b93fa6b to 73d64b1 Compare October 3, 2019 08:39
@beam2d beam2d force-pushed the fix-cuda_array_interface-without-strides branch from 73d64b1 to a4d747e Compare October 3, 2019 08:40
@leofang
Copy link
Contributor

leofang commented Oct 9, 2019

btw, it might be necessary in tensor_from_cuda_array_interface() to reject any arrays coming with a mask (introduced in the v1 protocol)? Currently Numba and CuPy do that, but I know nothing about PyTorch so I could be wrong.

@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 10, 2019
@leofang
Copy link
Contributor

leofang commented Oct 31, 2019

This PR is needed to work with Numba 0.46.0 and CuPy v7, see cupy/cupy#2589 (comment). Can someone take a look?

@gchanan
Copy link
Contributor

gchanan commented Oct 31, 2019

is it possible to add a test?

@gchanan
Copy link
Contributor

gchanan commented Oct 31, 2019

@leofang I commented over in numba/numba#4175 -- I think the standard is currently ambiguous when it wasn't before.


shape = tuple(self.shape)
strides = tuple(s * itemsize for s in self.stride())
if self.is_contiguous():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to comment whether this is an optimization or part of the protocol, but not strictly necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment based on my current understanding, but it looks discussions are still going at numba/numba#4175, so I will adapt it after all reaching a consensus :).

if self.is_contiguous():
strides = None
else:
strides = tuple(s * itemsize for s in self.stride())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the protocol, we need to zero out data here if numel is 0.

Copy link
Contributor

@leofang leofang Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. What's wrong here? It looks fine to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line below should be:
data = (self.data_ptr() if self.numel() > 0 else 0, False) # read-only is false

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch @gchanan! (Knew nothing about numel(), just looked it up from the doc.) In fact I also caught this myself for CuPy just yesterday...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I fixed it.

@beam2d
Copy link
Contributor Author

beam2d commented Nov 1, 2019

is it possible to add a test?

I fixed the existing cuda array interface test. Or do you mean adding a test for a case not covered by existing ones (e.g., non-contiguous case)?

@gchanan
Copy link
Contributor

gchanan commented Nov 5, 2019

I meant a direct test of cupy -- but if that's too difficult (dealing with dependencies and such), don't worry about it.

Copy link
Contributor

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @beam2d!

Hi @gchanan, could you please press the green button to merge this PR? This is necessary for resolving cupy/cupy#2589. It has been a long-standing bug on the PyTorch side even before the protocol was updated to v2, so I suggest we merge this and move on. We can always revisit our joyful discussion later 🙂 Thank you.

@leofang
Copy link
Contributor

leofang commented Dec 5, 2019

@madsbk @gchanan @ezyang Pinging you guys in case this PR is left unattended. It's already approved. Can any of you merge this please?

@ezyang ezyang self-requested a review December 5, 2019 21:33
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting this in my queue

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@leofang
Copy link
Contributor

leofang commented Dec 6, 2019

Thank you very much, @ezyang for taking care of this, @beam2d for fixing the bug, and everyone for participating in the discussion!

@jakirkham
Copy link

This is probably premature, but do we have a sense of when the next release will be? I'm guessing not for a while. Also a very rough answer would still be useful here. Thanks again for everyone's work on this issue. 🙂

@soumith
Copy link
Contributor

soumith commented Dec 6, 2019

@jakirkham v1.4.0 is aimed for Jan 7th

@leofang
Copy link
Contributor

leofang commented Dec 6, 2019

@soumith Thank you for sharing this info with us. I just requested to make this patch go into v1.4.0.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 1d7b40f.

gchanan added a commit to gchanan/pytorch that referenced this pull request Dec 18, 2019
… test.

This is a simpler fix than pytorch#24947, which both fixed the bug and updated the protocol version.
This also adds a test (which the previous PR did not).

So the plan is that master (1.5) will have the new protocol version (and a test), 1.4 will have the old protocol version and the test.
gchanan added a commit that referenced this pull request Dec 19, 2019
The PR that fixed this, #24947, didn't add a test.

Fixes: #31443

[ghstack-poisoned]
gchanan added a commit that referenced this pull request Dec 19, 2019
The PR that fixed this, #24947, didn't add a test.

Fixes: #31443

ghstack-source-id: ad6237c
Pull Request resolved: #31451
mingbowan pushed a commit that referenced this pull request Dec 20, 2019
… test. (#31450)

This is a simpler fix than #24947, which both fixed the bug and updated the protocol version.
This also adds a test (which the previous PR did not).

So the plan is that master (1.5) will have the new protocol version (and a test), 1.4 will have the old protocol version and the test.
gchanan added a commit that referenced this pull request Dec 20, 2019
gchanan added a commit that referenced this pull request Dec 20, 2019
The PR that fixed this, #24947, didn't add a test.

Fixes: #31443

ghstack-source-id: 2d59503
Pull Request resolved: #31451
facebook-github-bot pushed a commit that referenced this pull request Dec 20, 2019
Summary:
Pull Request resolved: #31451

The PR that fixed this, #24947, didn't add a test.

Fixes: #31443

Test Plan: Imported from OSS

Differential Revision: D19170020

Pulled By: gchanan

fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary:
When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs.
Pull Request resolved: pytorch#24947

Differential Revision: D18838986

Pulled By: ezyang

fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
)

Summary:
Pull Request resolved: pytorch#31451

The PR that fixed this, pytorch#24947, didn't add a test.

Fixes: pytorch#31443

Test Plan: Imported from OSS

Differential Revision: D19170020

Pulled By: gchanan

fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen module: numba module: numpy Related to numpy support, and also numpy compatibility of our operators open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.