-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[fixing cuda launch config failure on UpSampleNearest] #29016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fixing cuda launch config failure on UpSampleNearest] #29016
Conversation
Adding limitation on launch config for grid sizes as well; Test added in test_cuda;
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple small fixes.
| TORCH_CHECK( | ||
| gdim.x <= at::cuda::getCurrentDeviceProperties()->maxGridSize[0], | ||
| "input tensor has spatial dimension larger than the kernel capacity"); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensors are indexed with ints in the kernel, so just check that the number of elements fits, and don't do this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fair to assume that maxGridSize is not going to change in the future.
I'll update it for the places where it's the same kernel launch.
| TORCH_CHECK( | ||
| gdim.x <= at::cuda::getCurrentDeviceProperties()->maxGridSize[0], | ||
| "input tensor has spatial dimension larger than the kernel capacity"); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| with torch.backends.cudnn.flags(enabled=False): | ||
| self._test_rnn_retain_variables(device, dtype) | ||
|
|
||
| @onlyCUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much memory do those tests require? Regular CI machines are M60s with 6GB memory IIRC, so if you need more, use large memory decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the size of input + output, it takes 0.625GB. Given the way input/output are allocated in the code path, it should have cached 0.75GB with 0.625GB actually allocated.
Running the test independently I'm seeing 2.5GB cached memory in total at the end of the test, so we should be safe here.
|
Took me a while wondering how that expected failure is passing the test, then realized it's rocm... |
…nel and change the check using numerics check (assuming launch limitation to be 2**31-1 implicitly)
|
Can't get any idea out of failing CI. Should I be concerned? |
|
Failures are unrelated |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Adding limitation on launch config for grid size Test added in test_cuda; Pull Request resolved: pytorch/pytorch#29016 Differential Revision: D18293788 Pulled By: ngimel fbshipit-source-id: 44de308b05a4fe44bfffc2f3713fd9fa67ef74fa
|
When is this getting released? Right now I had to implement a Dataloader for inference. |
|
It is in nightly releases. |
Adding limitation on launch config for grid size
Test added in test_cuda;