Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned #44642

mcarilli · 2020-09-14T16:59:27Z

For #44206 and #42218, I'd like to update trilinear interpolate backward and grid_sample backward to use fastAtomicAdd.

As a prelude, I spotted a UB risk in fastAtomicAdd. I think existing code incurs a misaligned __half2 atomicAdd when index is odd and tensor is not 32-bit aligned (index % 2 == 1 and (reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1). In this case we think we're !low_bit and go down the !low_bit code path, but in fact we are low_bit. It appears the original fastAtomicAdd PR's discussion did not consider that case explicitly.

I wanted to push my tentative fix for discussion ASAP. @jjsjann123 and @mkolod as original authors of fastAtomicAdd. (I'm also curious why we need to reinterpret_cast<std::uintptr_t>(tensor... for the address modding, but that's minor.)

jjsjann123

Good catch!

codecov · 2020-09-14T20:28:09Z

Codecov Report

Merging #44642 into master will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #44642      +/-   ##
==========================================
- Coverage   67.98%   67.98%   -0.01%     
==========================================
  Files         384      384              
  Lines       49567    49567              
==========================================
- Hits        33697    33696       -1     
- Misses      15870    15871       +1

Impacted Files	Coverage Δ
torch/testing/_internal/expecttest.py	`77.55% <0.00%> (-1.03%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05c1f1d...3c36b43. Read the comment docs.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-09-15T06:12:38Z

@ngimel merged this pull request in 2435d94.

… 32 bit aligned (#44642) Summary: For #44206 and #42218, I'd like to update trilinear interpolate backward and grid_sample backward to use `fastAtomicAdd`. As a prelude, I spotted a UB risk in `fastAtomicAdd`. I think existing code incurs a misaligned `__half2` atomicAdd when `index` is odd and `tensor` is not 32-bit aligned (`index % 2 == 1` and `(reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1`). In this case we think we're `!low_bit` and go down the `!low_bit` code path, but in fact we are `low_bit`. It appears the original [fastAtomicAdd PR](#21879 (comment) discussion did not consider that case explicitly. I wanted to push my tentative fix for discussion ASAP. jjsjann123 and mkolod as original authors of `fastAtomicAdd`. (I'm also curious why we need to `reinterpret_cast<std::uintptr_t>(tensor...` for the address modding, but that's minor.) Pull Request resolved: #44642 Reviewed By: mruberry Differential Revision: D23699820 Pulled By: ngimel fbshipit-source-id: 0db57150715ebb45e6a1fb36897e46f00d61defd

Michael Carilli added 3 commits September 14, 2020 10:20

I think we need this

7ce1ff6

:thinking emoji:

4f52b09

byte

3c36b43

pytorchbot added the open source label Sep 14, 2020

mcarilli requested review from jjsjann123 and ngimel September 14, 2020 17:07

jjsjann123 approved these changes Sep 14, 2020

View reviewed changes

mcarilli changed the title ~~[WIP] Update cuda trilinear interpolate backward and grid_sample backward to use fast atomics~~ Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned Sep 14, 2020

ngimel approved these changes Sep 14, 2020

View reviewed changes

facebook-github-bot reviewed Sep 15, 2020

View reviewed changes

facebook-github-bot closed this in 2435d94 Sep 15, 2020

facebook-github-bot added the merged label Sep 15, 2020

mcarilli mentioned this pull request Oct 8, 2020

Upsample with a trilinear interpolation works at least 10x slower using Mixed Precision than with FP32. #44206

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned #44642

Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned #44642

Uh oh!

mcarilli commented Sep 14, 2020 •

edited

Loading

Uh oh!

jjsjann123 left a comment

Uh oh!

codecov bot commented Sep 14, 2020 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Sep 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned #44642

Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned #44642

Uh oh!

Conversation

mcarilli commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mcarilli commented Sep 14, 2020 •

edited

Loading

codecov bot commented Sep 14, 2020 •

edited

Loading