Skip to content

Conversation

@colesbury
Copy link
Member

@colesbury colesbury commented May 28, 2019

This is #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

cc @bddppq

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The gpu_index_kernel was missing the
can_use_32bit_indexing/with_32bit_indexing check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

This pytorch#20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e
@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels May 28, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@bddppq bddppq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@facebook-github-bot
Copy link
Contributor

@colesbury merged this pull request in b85c529.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 28, 2019
Summary:
This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh
that broke the ROCm build.

cc bddppq

Original summary:

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The `gpu_index_kernel` was missing the
`can_use_32bit_indexing`/`with_32bit_indexing` check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e
Pull Request resolved: pytorch/pytorch#21019

Differential Revision: D15518477

Pulled By: colesbury

fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants