-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[fixing reduction kernel launch] #22827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. Fix out of range memory access for reduction on all dimensions for non-packed tensor. 2. Enabling launch config that maps block width to reduction on fastest striding dimension. This mapping was previously only active when reducing on fastest striding dimension of packed tensor, which is not necessary.
|
@jjsjann123 feel free to request review |
|
Looks like bunch of tests failed here. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zdevito has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Ill land when the tests pass.
|
I don't have a local repro on circleci failure build failure looks strange as well. I don't think I have messed up history :/ |
|
@zdevito Could you land this PR? Thanks. |
Summary: 1. Fix out of range memory access for reduction on all dimensions for non-packed tensor. 2. Enabling launch config that maps block width to reduction on fastest striding dimension. This mapping was previously only active when reducing on fastest striding dimension of packed tensor, which is not necessary. Pull Request resolved: pytorch/pytorch#22827 Differential Revision: D16271897 Pulled By: zdevito fbshipit-source-id: 20763f6cf9a58e44ffc0e7ec27724dfec8fe2c5d
Fix out of range memory access for reduction on all dimensions for non-packed
tensor.
Enabling launch config that maps block width to reduction on fastest striding
dimension. This mapping was previously only active when reducing on fastest
striding dimension of packed tensor, which is not necessary.