Skip to content

Conversation

@vishwakftw
Copy link
Contributor

@vishwakftw vishwakftw commented May 29, 2019

In the previous implementation of triu / tril, we passed the batch size in the 2nd dimension of a grid. This is limited to 65535, which means that performing triu / tril on a tensor with batch size > 65535 will throw an error. This PR removes the dependence on the 2nd dimension, and corresponding non-contiguity constraints.

Changelog:

  • Compute offset, row and col in the kernel
  • Use 1st dimension of grid alone
  • Remove unnecessary contiguity checks on tensors as a result of this change.

Test Plan:

  • All tests should pass to verify that the change is correct.

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: operators labels May 29, 2019
@vishwakftw vishwakftw changed the title [WIP] Make CUDA triu / tril support batches of size > 65535 Make CUDA triu / tril support batches of size > 65535 May 30, 2019
@vishwakftw vishwakftw requested a review from ngimel May 30, 2019 04:47
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this PR if we need only functional support, if performance is important some changes are necessary. Code is nicely simplified.

@vishwakftw
Copy link
Contributor Author

@ngimel This is not a necessarily performance critical kernel - it's generally used in tandem with many linear algebra operations. However, since you mentioned that indexing based on int32 helps a lot for smaller inputs, I've incorporated it in the PR.

@vishwakftw
Copy link
Contributor Author

@pytorchbot rebase this please

@vishwakftw
Copy link
Contributor Author

@pytorchbot merge this please

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label May 31, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in eb4d43d.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 31, 2019
Summary:
In the previous implementation of triu / tril, we passed the batch size in the 2nd dimension of a grid. This is limited to 65535, which means that performing triu / tril on a tensor with batch size > 65535 will throw an error. This PR removes the dependence on the 2nd dimension, and corresponding non-contiguity constraints.

Changelog:
- Compute offset, row and col in the kernel
- Use 1st dimension of grid alone
- Remove unnecessary contiguity checks on tensors as a result of this change.
Pull Request resolved: pytorch/pytorch#21067

Differential Revision: D15572501

Pulled By: ezyang

fbshipit-source-id: 93851cb661918ce794d43eeb12c8a38762e1358c
@vishwakftw vishwakftw deleted the triu-tril-cuda-big-batches branch May 31, 2019 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-this-please Was marked for merge with @pytorchbot merge this please Merged module: cuda Related to torch.cuda, and CUDA support in general open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants