Skip to content

Conversation

@nairbv
Copy link
Collaborator

@nairbv nairbv commented Nov 5, 2019

Stack from ghstack:

Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

Differential Revision: D18575666

nairbv added a commit that referenced this pull request Nov 5, 2019
ghstack-source-id: d6d5a97
Pull Request resolved: #29231
@nairbv nairbv changed the title Fix bug in atomicAdd for int16_t [WIP] Fix bug in atomicAdd for int16_t Nov 5, 2019
nairbv added a commit that referenced this pull request Nov 6, 2019
ghstack-source-id: d8fe7ac
Pull Request resolved: #29231
@nairbv nairbv changed the title [WIP] Fix bug in atomicAdd for int16_t Fix bug in atomicAdd for int16_t Nov 6, 2019
@nairbv nairbv requested review from gchanan and zou3519 November 6, 2019 15:37
@zou3519
Copy link
Contributor

zou3519 commented Nov 7, 2019

What is the bug? Is the original calculation just flat-out wrong? This information would be nice to have as well in the commit message / PR body

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 7, 2019

What is the bug? Is the original calculation just flat-out wrong? This information would be nice to have as well in the commit message / PR body

it's in the linked issue, but I've now updated the PR to include the the repro here too. Yes the original code was just wrong, returned zeros.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a full review) I had some comments on the testing. Am trying to understanding what the bug was and what the fix is at the moment.

else:
tensor = torch.randint(0, 10, size, dtype=dtype, device=device)

# index_add calls atomicAdd on cuda.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we're relying on index_add calling atomicAdd. It might be clearer to introduce a C++ API test that only calls atomicAdd so that we're still testing this behavior if someone changes the implementation of index_add

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with you tomorrow. Last I looked it didn't seem like we had the infrastructure setup to build and run tests on cu files. If you know an easy way to do that or where they'd go maybe point me to it, otherwise I think setting that up could be a separate task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519
Copy link
Contributor

zou3519 commented Nov 7, 2019

What is the bug? Is the original calculation just flat-out wrong? This information would be nice to have as well in the commit message / PR body

it's in the linked issue, but I've now updated the PR to include the the repro here too. Yes the original code was just wrong, returned zeros.

What I am looking for is an English description of what the bug is and how this PR fixes it

Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```


[ghstack-poisoned]
if is_signed(dtype):
tensor = torch.randint(-5, 15, size, dtype=dtype, device=device)
else:
tensor = torch.randint(0, 10, size, dtype=dtype, device=device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atomicAdd bug looks like it is triggered in a very specific situation, when the sum being stored back into a memory address is signed. Would it be worth testing that case specifically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to handle sign properly, but it wasn't just that: the original code just always returned zero. In intermediate versions I thought I fixed the issue, but I ran into issues that only occurred at specific alignments of values within the 32 bit words. In practice I don't feel like I got good test signal when I wrote specific narrowly defined tests for particular cases, and the randint for the signed case should always get some negatives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give some sample inputs to index_add that trigger the bug? I want to know roughly the order of magnitude of the probability is that the randomly generated input does not catch the bug; we want to make our tests robust to changes in seed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example from the initial issue we actually got all zeros, so almost any random value would catch that. There are lots of possible ways this can be wrong though that were missed when I just tried to validate narrowly for particular values, and random tensors capture that more thoroughly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you walk through an example input and how it gets to all zeros? There's a lot of bit shifting going on and I'd like to make sure I understand how the implementation works

Copy link
Collaborator Author

@nairbv nairbv Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, nothing gets added at all. I'm not actually sure if there's a case that doesn't trigger this bug besides maybe adding a tensor of zeros. Here's an example in 1.3.0 for a single value with the expected result for int64 and the zero result for int16, it's basically a no-op:

# expected 0 + 1 == 1
>>> dtype=torch.int64
>>> torch.zeros([1], device='cuda', dtype=dtype).index_add(0, torch.tensor([0], device='cuda'), torch.ones([1], dtype=dtype, device='cuda'))
tensor([1], device='cuda:0')
>>> dtype=torch.int16
# unexpected 0 + 1 == 0
>>> torch.zeros([1], device='cuda', dtype=dtype).index_add(0, torch.tensor([0], device='cuda'), torch.ones([1], dtype=dtype, device='cuda'))
tensor([0], device='cuda:0', dtype=torch.int16)

It would be easy to add this example that motivated the change, but it's just so trivial that it doesn't add coverage. The test with a full multi-dimensional tensor and a variety of values can capture potential corner cases that this narrow test wouldn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not complaining about the test case coverage, I'm still trying to understand why the previous code is wrong. Because the fix was to change from doing unsigned math to signed math, there should be a non-trivial set of positive values for which the math is correct, but this doesn't seem to be the case from your explanation.

To help bridge my gap in understanding, it would be really helpful if you could run through an end-to-end example about how almost all inputs give zeros.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth trying to dig too deeply into all the ways the existing code was incorrect. It didn't work in any situation I know of and was never tested. There's more wrong with it than just the sign issue.

One thing that's wrong is that in sum = val + (size_t)address & 2 ? T(old >> 16) : T(old & 0xffff);, ... the val + (size_t)address &2 as an expression has precedence. If val is non-zero, we'll always call T(old>>16) regardless of what the address was. Just fixing that, we still have the sign issue, but I'm not sure that completely covers the problems here.

There are so many ways untested code can be wrong, it's hard to be certain of all of them without a lot of investigation and additional testing.

@nairbv
Copy link
Collaborator Author

nairbv commented Nov 8, 2019

@nairbv nairbv mentioned this pull request Nov 8, 2019
Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```


[ghstack-poisoned]
@nairbv nairbv requested a review from zou3519 November 8, 2019 20:16
Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```


[ghstack-poisoned]
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks correct. Had some comments about the tests

if is_signed(dtype):
tensor = torch.randint(-5, 15, size, dtype=dtype, device=device)
else:
tensor = torch.randint(0, 10, size, dtype=dtype, device=device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give some sample inputs to index_add that trigger the bug? I want to know roughly the order of magnitude of the probability is that the randomly generated input does not catch the bug; we want to make our tests robust to changes in seed.

old = atomicCAS(address_as_ui, assumed, old);
old_bytes = (old >> shift) & 0xff;
T tmp = THCNumerics<T>::add(val, old_bytes);
// maintain same size in initial cast to avoid padding negative values with 1s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the code reader I'd like a little more information. Something like:
"If we did something like uint32_t newval = tmp (which is -1), then newval = 0xffffffff, but we really just wanted to store one byte (0x000000ff)."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated a bit, also made the cast in-line instead of extra tmp values.

Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```


[ghstack-poisoned]
Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```


[ghstack-poisoned]
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine, had some nits with the testing

# add coverage for issue with atomic add that appeared only for
# specific dtypes on cuda:
# https://github.com/pytorch/pytorch/issues/29153
def test_dtypes_for_index_add(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call this "test_dtypes_for_index_add". The other test doesn't test dtypes, but it really should (on the other hand, test_index_add shouldn't test CUDA because then it would run for too long).

What you're doing here is checking that an index_add between zeros and a tensor gives back the original tensor. How about naming this something like test_index_add_trivial or something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "trivial" because it doesn't capture what's actually tested. The main purpose of this test is to capture if index_add doesn't work for some dtypes due to special dtype handling code. I could do another PR at some point to merge the two though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I feel really weird having both tests. In terms of design, it makes sense for all tests in test_torch, unless otherwise specified, to test all dtypes; something named "test_index_add" should really test everything unless there are multiple tests that have a prefix named "test_index_add" that have differentiating suffixes. I don't want to block this fix on that but it would be great if you could send a followup PR to clean the tests up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to eventually merge the two. I agree all of these tests should eventually be running for all dtypes.

Fixes: #29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```


[ghstack-poisoned]
zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 18, 2019
Summary:
Pull Request resolved: pytorch/pytorch#29231

Fixes: pytorch/pytorch#29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```

Test Plan: Imported from OSS

Differential Revision: D18575666

Pulled By: nairbv

fbshipit-source-id: 9b193b386bf4a9615014aa890d2e9f4f694940ac
@facebook-github-bot
Copy link
Contributor

@nairbv merged this pull request in adfb8a4.

@facebook-github-bot facebook-github-bot deleted the gh/nairbv/7/head branch November 22, 2019 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants