Skip to content

Conversation

@jjsjann123
Copy link
Collaborator

  1. Fix out of range memory access for reduction on all dimensions for non-packed
    tensor.

  2. Enabling launch config that maps block width to reduction on fastest striding
    dimension. This mapping was previously only active when reducing on fastest
    striding dimension of packed tensor, which is not necessary.

1. Fix out of range memory access for reduction on all dimensions for non-packed
tensor.

2. Enabling launch config that maps block width to reduction on fastest striding
dimension. This mapping was previously only active when reducing on fastest
striding dimension of packed tensor, which is not necessary.
@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels Jul 12, 2019
@jjsjann123
Copy link
Collaborator Author

cc @zdevito @ngimel

@jerryzh168
Copy link
Contributor

@jjsjann123 feel free to request review

@jerryzh168 jerryzh168 requested review from ngimel and zdevito July 13, 2019 01:32
@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 13, 2019
@jjsjann123
Copy link
Collaborator Author

Looks like bunch of tests failed here.
Let me get a clean rebuild and run the test locally to see if I can get some better hint.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdevito has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Ill land when the tests pass.

@jjsjann123
Copy link
Collaborator Author

I don't have a local repro on circleci failure RuntimeError: NCCL error in: /var/lib/jenkins/workspace/torch/lib/c10d/ProcessGroupNCCL.cpp:272, unhandled system error

build failure looks strange as well. I don't think I have messed up history :/

@zhangguanheng66
Copy link
Contributor

@zdevito Could you land this PR? Thanks.

@facebook-github-bot
Copy link
Contributor

@zdevito merged this pull request in a28ffaf.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 19, 2019
Summary:
1. Fix out of range memory access for reduction on all dimensions for non-packed
tensor.

2. Enabling launch config that maps block width to reduction on fastest striding
dimension. This mapping was previously only active when reducing on fastest
striding dimension of packed tensor, which is not necessary.
Pull Request resolved: pytorch/pytorch#22827

Differential Revision: D16271897

Pulled By: zdevito

fbshipit-source-id: 20763f6cf9a58e44ffc0e7ec27724dfec8fe2c5d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants