Fix bug in atomicAdd for int16_t #29231

nairbv · 2019-11-05T19:39:30Z

Stack from ghstack:

Add type promotion support in sparse tensors #29376 [WIP] Add type promotion support in sparse tensors
Fix bug in atomicAdd for int16_t #29231 Fix bug in atomicAdd for int16_t

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

Differential Revision: D18575666

[ghstack-poisoned]

ghstack-source-id: d6d5a97 Pull Request resolved: #29231

aten/src/THC/THCAtomics.cuh

Fixes: #29153 [ghstack-poisoned]

ghstack-source-id: d8fe7ac Pull Request resolved: #29231

zou3519 · 2019-11-07T16:08:49Z

What is the bug? Is the original calculation just flat-out wrong? This information would be nice to have as well in the commit message / PR body

nairbv · 2019-11-07T16:14:15Z

What is the bug? Is the original calculation just flat-out wrong? This information would be nice to have as well in the commit message / PR body

it's in the linked issue, but I've now updated the PR to include the the repro here too. Yes the original code was just wrong, returned zeros.

zou3519

(not a full review) I had some comments on the testing. Am trying to understanding what the bug was and what the fix is at the moment.

test/test_torch.py

zou3519 · 2019-11-07T16:12:05Z

test/test_torch.py

+            else:
+                tensor = torch.randint(0, 10, size, dtype=dtype, device=device)
+
+            # index_add calls atomicAdd on cuda.


It seems like we're relying on index_add calling atomicAdd. It might be clearer to introduce a C++ API test that only calls atomicAdd so that we're still testing this behavior if someone changes the implementation of index_add

I'll check with you tomorrow. Last I looked it didn't seem like we had the infrastructure setup to build and run tests on cu files. If you know an easy way to do that or where they'd go maybe point me to it, otherwise I think setting that up could be a separate task.

Some of our cpp tests definitely run on CUDA: https://github.com/pytorch/pytorch/blob/master/test/cpp/jit/test_fuser.cpp

test/test_torch.py

zou3519 · 2019-11-07T16:19:28Z

What is the bug? Is the original calculation just flat-out wrong? This information would be nice to have as well in the commit message / PR body

it's in the linked issue, but I've now updated the PR to include the the repro here too. Yes the original code was just wrong, returned zeros.

What I am looking for is an English description of what the bug is and how this PR fixes it

aten/src/THC/THCAtomics.cuh

Fixes: #29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` [ghstack-poisoned]

aten/src/THC/THCAtomics.cuh

zou3519 · 2019-11-07T17:04:47Z

test/test_torch.py

+            if is_signed(dtype):
+                tensor = torch.randint(-5, 15, size, dtype=dtype, device=device)
+            else:
+                tensor = torch.randint(0, 10, size, dtype=dtype, device=device)


The atomicAdd bug looks like it is triggered in a very specific situation, when the sum being stored back into a memory address is signed. Would it be worth testing that case specifically?

It needs to handle sign properly, but it wasn't just that: the original code just always returned zero. In intermediate versions I thought I fixed the issue, but I ran into issues that only occurred at specific alignments of values within the 32 bit words. In practice I don't feel like I got good test signal when I wrote specific narrowly defined tests for particular cases, and the randint for the signed case should always get some negatives.

Could you give some sample inputs to index_add that trigger the bug? I want to know roughly the order of magnitude of the probability is that the randomly generated input does not catch the bug; we want to make our tests robust to changes in seed.

In the example from the initial issue we actually got all zeros, so almost any random value would catch that. There are lots of possible ways this can be wrong though that were missed when I just tried to validate narrowly for particular values, and random tensors capture that more thoroughly.

Could you walk through an example input and how it gets to all zeros? There's a lot of bit shifting going on and I'd like to make sure I understand how the implementation works

I mean, nothing gets added at all. I'm not actually sure if there's a case that doesn't trigger this bug besides maybe adding a tensor of zeros. Here's an example in 1.3.0 for a single value with the expected result for int64 and the zero result for int16, it's basically a no-op:

# expected 0 + 1 == 1 >>> dtype=torch.int64 >>> torch.zeros([1], device='cuda', dtype=dtype).index_add(0, torch.tensor([0], device='cuda'), torch.ones([1], dtype=dtype, device='cuda')) tensor([1], device='cuda:0') >>> dtype=torch.int16 # unexpected 0 + 1 == 0 >>> torch.zeros([1], device='cuda', dtype=dtype).index_add(0, torch.tensor([0], device='cuda'), torch.ones([1], dtype=dtype, device='cuda')) tensor([0], device='cuda:0', dtype=torch.int16)

It would be easy to add this example that motivated the change, but it's just so trivial that it doesn't add coverage. The test with a full multi-dimensional tensor and a variety of values can capture potential corner cases that this narrow test wouldn't.

I'm not complaining about the test case coverage, I'm still trying to understand why the previous code is wrong. Because the fix was to change from doing unsigned math to signed math, there should be a non-trivial set of positive values for which the math is correct, but this doesn't seem to be the case from your explanation.

To help bridge my gap in understanding, it would be really helpful if you could run through an end-to-end example about how almost all inputs give zeros.

I'm not sure it's worth trying to dig too deeply into all the ways the existing code was incorrect. It didn't work in any situation I know of and was never tested. There's more wrong with it than just the sign issue.

One thing that's wrong is that in sum = val + (size_t)address & 2 ? T(old >> 16) : T(old & 0xffff);, ... the val + (size_t)address &2 as an expression has precedence. If val is non-zero, we'll always call T(old>>16) regardless of what the address was. Just fixing that, we still have the sign issue, but I'm not sure that completely covers the problems here.

There are so many ways untested code can be wrong, it's hard to be certain of all of them without a lot of investigation and additional testing.

nairbv · 2019-11-08T15:20:09Z

for reference it looks like the original code may have been adapted from:

https://github.com/rapidsai/cudf/blob/8ddb744a716aa8fe736284272b065cf5b7085065/cpp/src/reductions/reduction_operators.cuh#L20-L43
or
rapidsai/cudf#1179

Fixes: #29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` [ghstack-poisoned]

test/test_torch.py

aten/src/THC/THCAtomics.cuh

Fixes: #29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` [ghstack-poisoned]

zou3519

Code looks correct. Had some comments about the tests

test/test_torch.py

zou3519 · 2019-11-13T17:43:41Z

test/test_torch.py

+            if is_signed(dtype):
+                tensor = torch.randint(-5, 15, size, dtype=dtype, device=device)
+            else:
+                tensor = torch.randint(0, 10, size, dtype=dtype, device=device)


Could you give some sample inputs to index_add that trigger the bug? I want to know roughly the order of magnitude of the probability is that the randomly generated input does not catch the bug; we want to make our tests robust to changes in seed.

aten/src/THC/THCAtomics.cuh

zou3519 · 2019-11-13T17:47:31Z

aten/src/THC/THCAtomics.cuh

-      old = atomicCAS(address_as_ui, assumed, old);
+      old_bytes = (old >> shift) & 0xff;
+      T tmp = THCNumerics<T>::add(val, old_bytes);
+      // maintain same size in initial cast to avoid padding negative values with 1s


As the code reader I'd like a little more information. Something like:
"If we did something like uint32_t newval = tmp (which is -1), then newval = 0xffffffff, but we really just wanted to store one byte (0x000000ff)."

updated a bit, also made the cast in-line instead of extra tmp values.

Fixes: #29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` [ghstack-poisoned]

zou3519

Code looks fine, had some nits with the testing

test/test_torch.py

zou3519 · 2019-11-18T15:53:00Z

test/test_torch.py

+    # add coverage for issue with atomic add that appeared only for 
+    # specific dtypes on cuda:
+    # https://github.com/pytorch/pytorch/issues/29153
+    def test_dtypes_for_index_add(self):


I wouldn't call this "test_dtypes_for_index_add". The other test doesn't test dtypes, but it really should (on the other hand, test_index_add shouldn't test CUDA because then it would run for too long).

What you're doing here is checking that an index_add between zeros and a tensor gives back the original tensor. How about naming this something like test_index_add_trivial or something

I don't like "trivial" because it doesn't capture what's actually tested. The main purpose of this test is to capture if index_add doesn't work for some dtypes due to special dtype handling code. I could do another PR at some point to merge the two though.

Again, I feel really weird having both tests. In terms of design, it makes sense for all tests in test_torch, unless otherwise specified, to test all dtypes; something named "test_index_add" should really test everything unless there are multiple tests that have a prefix named "test_index_add" that have differentiating suffixes. I don't want to block this fix on that but it would be great if you could send a followup PR to clean the tests up.

I'm fine to eventually merge the two. I agree all of these tests should eventually be running for all dtypes.

test/test_torch.py

Fixes: #29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` [ghstack-poisoned]

Summary: Pull Request resolved: pytorch/pytorch#29231 Fixes: pytorch/pytorch#29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` Test Plan: Imported from OSS Differential Revision: D18575666 Pulled By: nairbv fbshipit-source-id: 9b193b386bf4a9615014aa890d2e9f4f694940ac

facebook-github-bot · 2019-11-18T22:34:13Z

@nairbv merged this pull request in adfb8a4.

Fix bug in atomicAdd for int16_t

867d20b

[ghstack-poisoned]

nairbv added a commit that referenced this pull request Nov 5, 2019

Fix bug in atomicAdd for int16_t

f2468e4

ghstack-source-id: d6d5a97 Pull Request resolved: #29231

nairbv changed the title ~~Fix bug in atomicAdd for int16_t~~ [WIP] Fix bug in atomicAdd for int16_t Nov 5, 2019

nairbv mentioned this pull request Nov 5, 2019

uncoalesced add(dense, sparse) missing sparse values for torch.int16 on cuda #29153

Closed

gchanan reviewed Nov 5, 2019

View reviewed changes

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

Update on "[WIP] Fix bug in atomicAdd for int16_t"

0fc5f60

Fixes: #29153 [ghstack-poisoned]

nairbv added a commit that referenced this pull request Nov 6, 2019

Fix bug in atomicAdd for int16_t

bc64b73

ghstack-source-id: d8fe7ac Pull Request resolved: #29231

nairbv changed the title ~~[WIP] Fix bug in atomicAdd for int16_t~~ Fix bug in atomicAdd for int16_t Nov 6, 2019

nairbv requested review from gchanan and zou3519 November 6, 2019 15:37

zou3519 reviewed Nov 7, 2019

View reviewed changes

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

nairbv mentioned this pull request Nov 7, 2019

Add type promotion support in sparse tensors #29376

Closed

zou3519 reviewed Nov 7, 2019

View reviewed changes

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

zou3519 reviewed Nov 7, 2019

View reviewed changes

nairbv mentioned this pull request Nov 8, 2019

Expose a dtype.is_signed #29475

Closed

nairbv requested a review from zou3519 November 8, 2019 20:16

gchanan reviewed Nov 11, 2019

View reviewed changes

test/test_torch.py Outdated Show resolved Hide resolved

zou3519 reviewed Nov 12, 2019

View reviewed changes

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

aten/src/THC/THCAtomics.cuh Outdated Show resolved Hide resolved

zou3519 reviewed Nov 13, 2019

View reviewed changes

nairbv added 2 commits November 13, 2019 15:24

zou3519 approved these changes Nov 18, 2019

View reviewed changes

facebook-github-bot closed this in adfb8a4 Nov 18, 2019

facebook-github-bot added the merged label Nov 18, 2019

facebook-github-bot deleted the gh/nairbv/7/head branch November 22, 2019 15:17

mruberry added the Merged label Oct 28, 2020

Fix bug in atomicAdd for int16_t #29231

Fix bug in atomicAdd for int16_t #29231

Uh oh!

Conversation

nairbv commented Nov 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zou3519 commented Nov 7, 2019

Uh oh!

nairbv commented Nov 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zou3519 commented Nov 7, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nairbv Nov 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nairbv commented Nov 8, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

nairbv commented Nov 5, 2019 •

edited

Loading

nairbv commented Nov 7, 2019 •

edited

Loading

nairbv Nov 13, 2019 •

edited

Loading