Fix advanced indexing on "huge" Tensors #20919

colesbury · 2019-05-24T18:10:16Z

This fixes advanced indexing in cases where there's more than 2^31-1
bytes in the output. The gpu_index_kernel was missing the
can_use_32bit_indexing/with_32bit_indexing check.

This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh,
OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit
integer.

More comprehensive tests that require a 32 GB GPU are here:
https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

Fixes #20888

This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Fixes pytorch#20888

test/test_cuda.py

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Fixes #20888 Pull Request resolved: pytorch/pytorch#20919 Differential Revision: D15501945 Pulled By: colesbury fbshipit-source-id: e876e678e866d2efda8ee92c47a1d2d1310671f0

facebook-github-bot · 2019-05-25T01:11:58Z

@colesbury merged this pull request in b93bdf6.

ezyang · 2019-05-25T13:13:23Z

Looks like this durably broke ROCm https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/22251//console (or you were unlucky and an upgrade happened at the same time)

04:17:28 test_multinomial (__main__.TestCuda) ... ### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_EXCEPTION (0x1016) at file:mcwamp_hsa.cpp line:1193
04:17:42 Traceback (most recent call last):
04:17:42   File "test/run_test.py", line 441, in <module>
04:17:42     main()
04:17:42   File "test/run_test.py", line 433, in main
04:17:42     raise RuntimeError(message)
04:17:42 RuntimeError: test_cuda failed! Received signal: SIGIOT

ezyang · 2019-05-25T13:13:31Z

cc @bddppq @iotamudelta

colesbury · 2019-05-25T18:52:25Z

I think the test failure is unrelated to this change

This reverts commit b93bdf6.

colesbury · 2019-05-25T19:05:46Z

Well... maybe not. I'll see if #20948 fixes the ROCm build

bddppq · 2019-05-28T02:48:49Z

There was no rocm upgrade recently. Although I don't see anything obviously suspicious in this diff either :(.
Since this has broken master, let's revert it. I will work with @colesbury to locally repro and further debug it tomorrow.

This pytorch#20919 without the changes to aten/src/THC/THCIntegerDivider.cuh that broke the ROCm build. Original summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e

This is the change to aten/src/THC/THCIntegerDivider.cuh from pytorch#20919 in order to debug why it breaks the ROCm multinomial test.

Summary: This #20919 without the changes to aten/src/THC/THCIntegerDivider.cuh that broke the ROCm build. cc bddppq Original summary: This fixes advanced indexing in cases where there's more than 2^31-1 bytes in the output. The `gpu_index_kernel` was missing the `can_use_32bit_indexing`/`with_32bit_indexing` check. This also adds a number of TORCH_INTERNAL_ASSERTS in Loops.cuh, OffsetCalculator, and IntDivider that sizes are fit in a signed 32-bit integer. More comprehensive tests that require a 32 GB GPU are here: https://gist.github.com/colesbury/e29387f5851521256dff562be07b981e Pull Request resolved: #21019 Differential Revision: D15518477 Pulled By: colesbury fbshipit-source-id: 4db5626fda76eb58250793e8aa7d4f2832db3a34

pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels May 24, 2019

colesbury mentioned this pull request May 24, 2019

Fix indexing of large tensors #20920

Closed

colesbury requested a review from gchanan May 24, 2019 20:06

gchanan reviewed May 24, 2019

View reviewed changes

test/test_cuda.py Show resolved Hide resolved

gchanan approved these changes May 24, 2019

View reviewed changes

Add @slowtest to test_huge_index

cc7256c

facebook-github-bot reviewed May 24, 2019

View reviewed changes

facebook-github-bot closed this in b93bdf6 May 24, 2019

facebook-github-bot added the merged label May 25, 2019

colesbury added a commit to colesbury/pytorch that referenced this pull request May 25, 2019

Revert "Fix advanced indexing on "huge" Tensors (pytorch#20919)"

10943a1

This reverts commit b93bdf6.

bddppq mentioned this pull request May 28, 2019

test_multinomial broken on rocm #20989

Closed

colesbury mentioned this pull request May 28, 2019

Re-land "Fix advanced indexing on "huge" Tensors" #21019

Closed

colesbury added a commit to colesbury/pytorch that referenced this pull request May 28, 2019

[Do not land] Use exceptions in THCIntegerDivider.cuh

855eb72

This is the change to aten/src/THC/THCIntegerDivider.cuh from pytorch#20919 in order to debug why it breaks the ROCm multinomial test.

colesbury mentioned this pull request May 28, 2019

[Do not land] Use exceptions in THCIntegerDivider.cuh #21020

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix advanced indexing on "huge" Tensors #20919

Fix advanced indexing on "huge" Tensors #20919

Uh oh!

colesbury commented May 24, 2019 •

edited

Loading

Uh oh!

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented May 25, 2019

Uh oh!

ezyang commented May 25, 2019

Uh oh!

ezyang commented May 25, 2019

Uh oh!

colesbury commented May 25, 2019

Uh oh!

colesbury commented May 25, 2019

Uh oh!

bddppq commented May 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Fix advanced indexing on "huge" Tensors #20919

Fix advanced indexing on "huge" Tensors #20919

Uh oh!

Conversation

colesbury commented May 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 25, 2019

Uh oh!

ezyang commented May 25, 2019

Uh oh!

ezyang commented May 25, 2019

Uh oh!

colesbury commented May 25, 2019

Uh oh!

colesbury commented May 25, 2019

Uh oh!

bddppq commented May 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

colesbury commented May 24, 2019 •

edited

Loading