Skip to content

Conversation

@mcarilli
Copy link
Collaborator

@mcarilli mcarilli commented Sep 14, 2020

For #44206 and #42218, I'd like to update trilinear interpolate backward and grid_sample backward to use fastAtomicAdd.

As a prelude, I spotted a UB risk in fastAtomicAdd. I think existing code incurs a misaligned __half2 atomicAdd when index is odd and tensor is not 32-bit aligned (index % 2 == 1 and (reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1). In this case we think we're !low_bit and go down the !low_bit code path, but in fact we are low_bit. It appears the original fastAtomicAdd PR's discussion did not consider that case explicitly.

I wanted to push my tentative fix for discussion ASAP. @jjsjann123 and @mkolod as original authors of fastAtomicAdd. (I'm also curious why we need to reinterpret_cast<std::uintptr_t>(tensor... for the address modding, but that's minor.)

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@mcarilli mcarilli changed the title [WIP] Update cuda trilinear interpolate backward and grid_sample backward to use fast atomics Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned Sep 14, 2020
@codecov
Copy link

codecov bot commented Sep 14, 2020

Codecov Report

Merging #44642 into master will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #44642      +/-   ##
==========================================
- Coverage   67.98%   67.98%   -0.01%     
==========================================
  Files         384      384              
  Lines       49567    49567              
==========================================
- Hits        33697    33696       -1     
- Misses      15870    15871       +1     
Impacted Files Coverage Δ
torch/testing/_internal/expecttest.py 77.55% <0.00%> (-1.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05c1f1d...3c36b43. Read the comment docs.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 2435d94.

xuzhao9 pushed a commit that referenced this pull request Sep 18, 2020
… 32 bit aligned (#44642)

Summary:
For #44206 and #42218, I'd like to update trilinear interpolate backward and grid_sample backward to use `fastAtomicAdd`.

As a prelude, I spotted a UB risk in `fastAtomicAdd`.  I think existing code incurs a misaligned `__half2` atomicAdd when `index` is odd and `tensor` is not 32-bit aligned (`index % 2 == 1` and `(reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1`). In this case we think we're `!low_bit` and go down the `!low_bit` code path, but in fact we are `low_bit`.  It appears the original [fastAtomicAdd PR](#21879 (comment) discussion did not consider that case explicitly.

I wanted to push my tentative fix for discussion ASAP. jjsjann123 and mkolod as original authors of `fastAtomicAdd`. (I'm also curious why we need to `reinterpret_cast<std::uintptr_t>(tensor...` for the address modding, but that's minor.)

Pull Request resolved: #44642

Reviewed By: mruberry

Differential Revision: D23699820

Pulled By: ngimel

fbshipit-source-id: 0db57150715ebb45e6a1fb36897e46f00d61defd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants