-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Fix reading __cuda_array_interface__ without strides
#24947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reading __cuda_array_interface__ without strides
#24947
Conversation
|
can you provide a reproduction? |
|
OK, here is simple code to reproduce the error. import cupy
import torch
a = cupy.ones(3)
torch.tensor(a)This script gives the following error. |
|
@beam2d @gchanan @madsbk @asford The updated |
I don't see any problems with the new protocol but we should throw a more descriptive error message if However, the Numba integration test might fail because of version mismatch. I suggest that we remove or change line: pytorch/test/test_numba_integration.py Line 101 in f583f2e
Finally, there is a typo here: https://github.com/pytorch/pytorch/blob/b93fa6b970cc2e12cfeb8cd1a51dce2d95d797de/torch/csrc/utils/tensor_numpy.cpp#L266 throw TypeError("attribute `data` must exist"); |
@madsbk Note that in the new protocol, a non-existing keyword Line 507 in 0b79f77
which currently always spells out the strides regardless of contiguity. |
|
Let me try to make it follow the new protocol. |
b93fa6b to
73d64b1
Compare
73d64b1 to
a4d747e
Compare
|
btw, it might be necessary in |
|
This PR is needed to work with Numba 0.46.0 and CuPy v7, see cupy/cupy#2589 (comment). Can someone take a look? |
|
is it possible to add a test? |
|
@leofang I commented over in numba/numba#4175 -- I think the standard is currently ambiguous when it wasn't before. |
|
|
||
| shape = tuple(self.shape) | ||
| strides = tuple(s * itemsize for s in self.stride()) | ||
| if self.is_contiguous(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to comment whether this is an optimization or part of the protocol, but not strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment based on my current understanding, but it looks discussions are still going at numba/numba#4175, so I will adapt it after all reaching a consensus :).
| if self.is_contiguous(): | ||
| strides = None | ||
| else: | ||
| strides = tuple(s * itemsize for s in self.stride()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the protocol, we need to zero out data here if numel is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. What's wrong here? It looks fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line below should be:
data = (self.data_ptr() if self.numel() > 0 else 0, False) # read-only is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch @gchanan! (Knew nothing about numel(), just looked it up from the doc.) In fact I also caught this myself for CuPy just yesterday...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I fixed it.
I fixed the existing cuda array interface test. Or do you mean adding a test for a case not covered by existing ones (e.g., non-contiguous case)? |
|
I meant a direct test of cupy -- but if that's too difficult (dealing with dependencies and such), don't worry about it. |
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @beam2d!
Hi @gchanan, could you please press the green button to merge this PR? This is necessary for resolving cupy/cupy#2589. It has been a long-standing bug on the PyTorch side even before the protocol was updated to v2, so I suggest we merge this and move on. We can always revisit our joyful discussion later 🙂 Thank you.
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting this in my queue
…rface-without-strides
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
This is probably premature, but do we have a sense of when the next release will be? I'm guessing not for a while. Also a very rough answer would still be useful here. Thanks again for everyone's work on this issue. 🙂 |
|
@jakirkham v1.4.0 is aimed for Jan 7th |
|
@soumith Thank you for sharing this info with us. I just requested to make this patch go into v1.4.0. |
… test. This is a simpler fix than pytorch#24947, which both fixed the bug and updated the protocol version. This also adds a test (which the previous PR did not). So the plan is that master (1.5) will have the new protocol version (and a test), 1.4 will have the old protocol version and the test.
… test. (#31450) This is a simpler fix than #24947, which both fixed the bug and updated the protocol version. This also adds a test (which the previous PR did not). So the plan is that master (1.5) will have the new protocol version (and a test), 1.4 will have the old protocol version and the test.
The PR that fixed this, #24947, didn't add a test. Fixes: #31443 Differential Revision: [D19170020](https://our.internmc.facebook.com/intern/diff/D19170020) [ghstack-poisoned]
Summary: When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs. Pull Request resolved: pytorch#24947 Differential Revision: D18838986 Pulled By: ezyang fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42
) Summary: Pull Request resolved: pytorch#31451 The PR that fixed this, pytorch#24947, didn't add a test. Fixes: pytorch#31443 Test Plan: Imported from OSS Differential Revision: D19170020 Pulled By: gchanan fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b
When converting a contiguous CuPy ndarray to Tensor via
__cuda_array_interface__, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makestorch.tensor(cupy_ndarray)works for contiguous inputs.