Fix the issue when NHWC Tensor has height or width larger then max cuda grid #28931

ifedan · 2019-10-30T22:11:28Z

When NHWC Tensor has height or width larger then max CUDA grid size, max_pool fails with error code 0

The example is: #28714

This change should limit grid size to the CUDA max possible size and chunk the input to be able to process it.

jjsjann123 · 2019-10-31T17:24:18Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

-              maxThreadsDim[0], std::min<int>(lastPow2(nInputPlane), max_threads / block_y / block_z));
-          const dim3 block(block_x, block_y, block_z);
-          int grid_x = nbatch;
-          int grid_y = cuda::ATenCeilDiv(safe_downcast<int, int64_t>(outputWidth), block_y*BLOCK_STRIDE);


We shouldn't need multiple kernel launching, as we are striding along width/height in the kernel already.

I would discard all changes and simply add change this line (and grid_z as well)

int grid_y = std::min<int>( at::cuda::getCurrentDeviceProperties()->maxGridSize[1], cuda::ATenCeilDiv(safe_downcast<int, int64_t>(outputWidth), block_y*BLOCK_STRIDE));

Please do NOT do the same thing for grid_x, as the kernel does not loop over batch dimension. The fix you have here could be used for that.

I updated grid, and removed slicing, but I still see the performance degradation for backward function.

hmmm, perf regression doesn't really make any sense here. It's basically the same kernel.
Not sure if it's related to that typo I pointed out in the other comment.

Changing the grid size would affect the occupancy and maybe cache hit? I need to think about it further before I can make any decisive call. But let's fix the typo and hope the problem goes away

jjsjann123 · 2019-10-31T17:24:55Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

-              maxThreadsDim[0], std::min<int>(lastPow2(nInputPlane), max_threads / block_y / block_z));
-          const dim3 block(block_x, block_y, block_z);
-          int grid_x = nbatch;
-          int grid_y = cuda::ATenCeilDiv(safe_downcast<int, int64_t>(inputWidth), block_y*BLOCK_STRIDE);


Same thing as I commented on forward.

jjsjann123 · 2019-11-01T19:40:37Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

+              cuda::ATenCeilDiv(safe_downcast<int, int64_t>(outputWidth), block_y*BLOCK_STRIDE));
+          int grid_z = std::min<int>(
+              at::cuda::getCurrentDeviceProperties()->maxGridSize[2],
+              cuda::ATenCeilDiv(safe_downcast<int, int64_t>(outputHeight), block_y*BLOCK_STRIDE));


Typo here! block_y should be block_z

jjsjann123 · 2019-11-01T19:40:55Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

+              cuda::ATenCeilDiv(safe_downcast<int, int64_t>(inputWidth), block_y*BLOCK_STRIDE));
+          int grid_z = std::min<int>(
+              at::cuda::getCurrentDeviceProperties()->maxGridSize[2],
+              cuda::ATenCeilDiv(safe_downcast<int, int64_t>(inputHeight), block_y*BLOCK_STRIDE));


block_y -> block_z

ifedan · 2019-11-05T20:16:14Z

jjsjann123 · 2019-11-06T19:47:42Z

LGTM.

Did that few line change actually give a hit on performance? Or is there a high variance between runs?

facebook-github-bot

@ifedan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…da grid (#28931) Summary: When NHWC Tensor has height or width larger then max CUDA grid size, max_pool fails with error code 0 The example is: pytorch/pytorch#28714 This change should limit grid size to the CUDA max possible size and chunk the input to be able to process it. Pull Request resolved: pytorch/pytorch#28931 Differential Revision: D18358892 Pulled By: ifedan fbshipit-source-id: 2fd65448bd644f1588a0e208edaaea5bcb6a7d52

facebook-github-bot · 2019-11-08T05:13:37Z

@ifedan merged this pull request in 5d70b11.

ifedan added 2 commits October 30, 2019 14:40

Fix the issue when height or width is large then cuda grid

a1823ee

Fix the issue when height or width is large then cuda grid

cc20d5b

ifedan changed the title ~~28714~~ [WIP] Fix the issue when NHWC Tensor has height or width larger then max cuda grid Oct 30, 2019

jjsjann123 reviewed Oct 31, 2019

View reviewed changes

Changes based on PR review

468beb1

jjsjann123 reviewed Nov 1, 2019

View reviewed changes

Changes based on PR

8e037ee

ifedan marked this pull request as ready for review November 5, 2019 20:16

ifedan requested a review from VitalyFedyunin November 5, 2019 20:16

ifedan changed the title ~~[WIP] Fix the issue when NHWC Tensor has height or width larger then max cuda grid~~ Fix the issue when NHWC Tensor has height or width larger then max cuda grid Nov 5, 2019

ifedan requested a review from jjsjann123 November 6, 2019 16:30

jjsjann123 approved these changes Nov 6, 2019

View reviewed changes

VitalyFedyunin approved these changes Nov 6, 2019

View reviewed changes

facebook-github-bot reviewed Nov 6, 2019

View reviewed changes

facebook-github-bot closed this in 5d70b11 Nov 7, 2019

facebook-github-bot added the merged label Nov 8, 2019

mruberry added the Merged label Oct 28, 2020

Fix the issue when NHWC Tensor has height or width larger then max cuda grid #28931

Fix the issue when NHWC Tensor has height or width larger then max cuda grid #28931

Uh oh!

Conversation

ifedan commented Oct 30, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Nov 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ifedan commented Nov 5, 2019

Uh oh!

jjsjann123 commented Nov 6, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 8, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jjsjann123 Nov 1, 2019 •

edited

Loading