Skip to content

Conversation

@jjsjann123
Copy link
Collaborator

Adding limitation on launch config for grid size
Test added in test_cuda;

Adding limitation on launch config for grid sizes as well;
Test added in test_cuda;
@jjsjann123 jjsjann123 requested a review from ngimel November 1, 2019 03:22
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple small fixes.

TORCH_CHECK(
gdim.x <= at::cuda::getCurrentDeviceProperties()->maxGridSize[0],
"input tensor has spatial dimension larger than the kernel capacity");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensors are indexed with ints in the kernel, so just check that the number of elements fits, and don't do this check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fair to assume that maxGridSize is not going to change in the future.
I'll update it for the places where it's the same kernel launch.

TORCH_CHECK(
gdim.x <= at::cuda::getCurrentDeviceProperties()->maxGridSize[0],
"input tensor has spatial dimension larger than the kernel capacity");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

with torch.backends.cudnn.flags(enabled=False):
self._test_rnn_retain_variables(device, dtype)

@onlyCUDA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much memory do those tests require? Regular CI machines are M60s with 6GB memory IIRC, so if you need more, use large memory decorator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the size of input + output, it takes 0.625GB. Given the way input/output are allocated in the code path, it should have cached 0.75GB with 0.625GB actually allocated.

Running the test independently I'm seeing 2.5GB cached memory in total at the end of the test, so we should be safe here.

@jjsjann123
Copy link
Collaborator Author

Took me a while wondering how that expected failure is passing the test, then realized it's rocm...
I'll just exclude rocm from that, since I have no idea what are the limitations. Feel free to add a rocm specific failure test.

…nel and change the check using numerics check (assuming launch limitation to be 2**31-1 implicitly)
@jjsjann123
Copy link
Collaborator Author

Can't get any idea out of failing CI. Should I be concerned?

@ngimel
Copy link
Collaborator

ngimel commented Nov 4, 2019

Failures are unrelated

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 4, 2019
Summary:
Adding limitation on launch config for grid size
Test added in test_cuda;
Pull Request resolved: pytorch/pytorch#29016

Differential Revision: D18293788

Pulled By: ngimel

fbshipit-source-id: 44de308b05a4fe44bfffc2f3713fd9fa67ef74fa
@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 70f3f23.

@zhanwenchen
Copy link

When is this getting released? Right now I had to implement a Dataloader for inference.

@ngimel
Copy link
Collaborator

ngimel commented Nov 7, 2019

It is in nightly releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants