-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[ready for review] Batched upper triangular, lower triangular #15257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I've tried to debug this error, but to no avail. Posting it below: |
|
@vishwakftw do you get the same error message on a local build? |
|
Yes, I do. |
|
For additional context about the design - the design for the CUDA implementation is similar to THC implementation, where a custom op ( |
|
@zou3519 is this good to go? Also, should I remove the TH/THC implementations as well? |
6231624 to
a9d844f
Compare
|
I did some benchmarking of the PR versus current master for the triu / tril ops on CPU and CUDA. Below are links to the results: |
a9d844f to
18d611d
Compare
…timize CPU implementation - The thrust implementation seemed to be incredibly slow - The CPU implementation was bottlenecked by a clone() op - Add test cases for non-square matrices, and an addition based test
18d611d to
62f31c4
Compare
|
@vishwakftw I'll take a look later today |
|
Thank you, much obliged. |
|
No, thank you for the contribution :) |
|
Distributions tests seem to be failing |
zou3519
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU code looks fine, still reading the CUDA kernel
This reverts commit e907c74.
|
CUDA tests are failing with this error: |
…enforce contiguity implicitly.
ngimel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not check the cpu part, I requested small changes, but this is generally good now.
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of this. Great work!
bf9b1b5 to
8e2d591
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: pytorch/pytorch#15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4
Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: pytorch#15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4
Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: #15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4
Summary: Changelog: - Implements `triu` and `tril` for batches of 2D tensors. - Remove TH/THC binding for `tril` - Fix CUDA implementation - Update docstrings for tril and triu. - Remove mask-based `triu` and `tril` in cholesky forward and backward. - Remove batched tril in torch.distributions.utils Pull Request resolved: #15257 Differential Revision: D13613888 Pulled By: mrshenli fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4
|
Hi,
And I think the "magma_int_t" uses 32-bit integer by default. Best |
|
Hi Yongheng, thanks for the message. Are you facing issues in |
Yes. I am using it in a batch-wise eigenvalue decomposition. There is an error when the batch size is larger than 65535. And this is the cuda error:
|
|
I think I know the reason for this: the batches for the triu and tril use the y-dimension of the grid, whose limit is 65535. I will try to fix within the next week. For now, you could try mini-batching them in a loop. Sorry about the inconvenience. |
Thanks very much. Glad to help |
|
@yongheng1991 Please take a look at #21067 which adds support for batch sizes > 65535. |
Changelog:
triuandtrilfor batches of 2D tensors.triltriuandtrilin cholesky forward and backward.Test plan:
Fixes #15016, fixes #15226 and closes #14071
Acknowledgements: