Skip manual backward for `cdist` with case `p=2` #31167

emcastillo · 2019-12-12T04:56:19Z

Fixes an issue with cdist backward calculation for large inputs for the euclidean case.

The grid size when launching the kernel exceeded the 2^16 limit for the second dimension, resulting in RuntimeError: CUDA error: invalid configuration argument

Code to reproduce:

h, w, d = 800, 1216, 12
n = 133
A = torch.randn(n, d).cuda()
B = torch.randn(h, w, d).cuda()
A.requires_grad = True
B.requires_grad = True

B = B.reshape(-1, d).contiguous()
dist = torch.cdist(A, B)
loss = dist.sum()
loss.backward()

Thanks to @tkerola for the bug report, reproduction and suggesting a solution.

tkerola · 2019-12-12T05:00:20Z

I think this will solve #27209 as well.

aten/src/ATen/native/cuda/DistanceKernel.cu

tkerola

Just some small comments.

kostmo · 2019-12-12T05:59:39Z

💊 CircleCI build failures summary and remediations

As of commit 7ab0f56:

2/2 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakage:

pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Test" (full log | pattern match details)

Feb 25 03:09:12 RuntimeError: test_quantization failed!

Feb 25 03:09:12 Ran 36 tests in 57.272s 
Feb 25 03:09:12  
Feb 25 03:09:12 FAILED (errors=1, skipped=1) 
Feb 25 03:09:12  
Feb 25 03:09:12 Generating XML reports... 
Feb 25 03:09:12 Traceback (most recent call last): 
Feb 25 03:09:12   File "test/run_test.py", line 486, in <module> 
Feb 25 03:09:12     main() 
Feb 25 03:09:12   File "test/run_test.py", line 479, in main 
Feb 25 03:09:12     raise RuntimeError(message) 
Feb 25 03:09:12 RuntimeError: test_quantization failed! 
Feb 25 03:09:12 + cleanup 
Feb 25 03:09:12 + retcode=1 
Feb 25 03:09:12 + set +x 
Feb 25 03:09:12 =================== sccache compilation log =================== 
Feb 25 03:09:12 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Feb 25 03:09:12 Compile requests                  7 
Feb 25 03:09:12 Compile requests executed         6 
Feb 25 03:09:12 Cache hits                        0 
Feb 25 03:09:12 Cache misses                      6 
Feb 25 03:09:12 Cache timeouts                    0

pytorch_linux_xenial_cuda10_1_cudnn7_py3_NO_AVX_NO_AVX2_test (2/2)

Step: "Test" (full log | pattern match details)

Feb 25 04:56:42 RuntimeError: test_quantization failed!

Feb 25 04:56:42 Ran 36 tests in 59.569s 
Feb 25 04:56:42  
Feb 25 04:56:42 FAILED (errors=1, skipped=1) 
Feb 25 04:56:42  
Feb 25 04:56:42 Generating XML reports... 
Feb 25 04:56:42 Traceback (most recent call last): 
Feb 25 04:56:42   File "test/run_test.py", line 486, in <module> 
Feb 25 04:56:42     main() 
Feb 25 04:56:42   File "test/run_test.py", line 479, in main 
Feb 25 04:56:42     raise RuntimeError(message) 
Feb 25 04:56:42 RuntimeError: test_quantization failed! 
Feb 25 04:56:43 + cleanup 
Feb 25 04:56:43 + retcode=1 
Feb 25 04:56:43 + set +x 
Feb 25 04:56:43 =================== sccache compilation log =================== 
Feb 25 04:56:43 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Feb 25 04:56:43 Compile requests                32 
Feb 25 04:56:43 Compile requests executed       11 
Feb 25 04:56:43 Cache hits                       1 
Feb 25 04:56:43 Cache misses                    10 
Feb 25 04:56:43 Cache timeouts                   0

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 36 times.

ngimel

Please add a test for the case you are fixing. Also, this will still break for m>65K168 which is approx 32 million, but that's better than before.

ptrblck · 2019-12-24T08:03:04Z

Similar issue in pdist as reported here: #31593 (comment)

@emcastillo Let me know, if you want to fix both methods in the same run or if I should take care of pdist.

emcastillo · 2019-12-25T04:39:02Z

I am trying to write the test, but the required matrix sizes for it to fail are quite big, resulting in the test failing with an out-of-memory error when running the functional checks. How should I proceed in this case?

@ptrblck you can take care of pdist :)

ngimel · 2019-12-26T21:50:31Z

@emcastillo it looks like you are hitting #24345, and it looks like it was never resolved. You can either get back to whatever cdist implementation was before pytorch 1.2 (you'd need to add batching support, because it did not exist before pytorch 1.2) that supposedly did not use as much memory, or you can, at least for euclidian distance, let pytorch figure that backward pass itself, and call the necessary matrix multiplies (#31599), that would take care of most practical cases. Non-euclidian distances would still through an error.

emcastillo · 2020-01-05T13:05:01Z

Thanks for the advice!
I will try to let pytorch to do the backward pass itself for euclidian distances.
I am still pretty much new to the PyTorch codebase so I guess it will take me a while to figure it out, so I am sorry for the time it is likely going to take.

Happy new year

ngimel · 2020-01-06T01:05:06Z

Happy New Year! Similar thing is done for adaptive_avg_pooling https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/AdaptiveAveragePooling.cpp#L324-L340 - in some cases it can have device-independent differentiable implementation, and in this case it does not have device-specific dispatch and gradient formula, and in the general case _adaptive_avg_pool2d has device-specific dispatch and device-specific backward formulas (also look in native_functions.yaml for adaptive_avg_pool2d and _adaptive_avg_pool2d).

Zhaoyi-Yan · 2020-01-06T07:20:00Z

I met this problem in pytorch v1.3.1. For my case, I need to compute the similarity between a matrix (N, 2500, 256) with another matrix (N, 2500, 256), does this will deal with it ? Also I am not sure whether it will be cherry-picked in v1.4. Or there exists some workaround for this.

Edit: N is a small number, eg. 8.

emcastillo · 2020-01-23T10:16:38Z

@ngimel, sorry for the delay I can finally start working on this.

What I understood from reading the code and the links you pointed me out is that I need to define a generic new function for cdist and this function should be registered in aten/src/ATen/native/native_functions.yaml without dispatch logic and it should not be defined in tools/autograd/derivatives.yaml

This function should be the main one that is called when cdist is executed and call the current cdist for non-euclidean distances which have backward mapped and autograd should call the specific implementations later, or the current matrix-mult based approach which won't have a backward function defined so autograd can do itself the backward pass.

Please correct me if I am wrong (which I most likely am 😂)

ngimel · 2020-02-24T16:33:15Z

@emcastillo please address @ailzhang's comment and rebase. Thanks!

ailzhang · 2020-02-25T01:38:15Z

Hi, so there’s a list of tests that we skip in pytorch/xla. In the PR linked above I’ve added these two tests in the skipper list. So if you remove the if branch now, it should be passing ;) Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: emcastillo <[email protected]> Sent: Monday, February 24, 2020 5:35:59 PM To: pytorch/pytorch <[email protected]> Cc: Ailing Zhang <[email protected]>; Mention <[email protected]> Subject: Re: [pytorch/pytorch] Skip manual backward for `cdist` with case `p=2` (#31167) @emcastillo commented on this pull request.

________________________________ In test/test_torch.py<#31167 (comment)>:

@@ -9745,20 +9745,22 @@ def test_cdist_norm_batch(self, device):

self.assertTrue(torch.allclose(expected, actual)) def test_cdist_large(self, device): - for cm in ['use_mm_for_euclid_dist_if_necessary', 'use_mm_for_euclid_dist', 'donot_use_mm_for_euclid_dist']: - x = torch.randn(1000, 10, device=device) - y = torch.randn(1000, 10, device=device) - actual = torch.cdist(x, y, p=2, compute_mode=cm) - expected = self._brute_cdist(x, y, p=2) - self.assertTrue(torch.allclose(expected, actual)) + if self.device_type in ('cpu', 'cuda'): Hi! The problem here is that in these two tests it says that cdist is not compatible with XLA, hence the failure. This is expected as there is an device check in cdist that throws an exception if device != cuda or cpu. Do you have any clue of why these two tests are executed while other cdist tests seem to be ok? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#31167?email_source=notifications&email_token=ABIBI6TNMZBKY46Q6I5VRPDRERYX7A5CNFSM4JZY7VP2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCWXW6EY#discussion_r383613249>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABIBI6QBEYOE4CYDBFOBNILRERYX7ANCNFSM4JZY7VPQ>.

emcastillo · 2020-02-25T01:40:09Z

Check removed and rebased!
Thanks for all the help!!

emcastillo · 2020-02-25T07:03:20Z

@ngimel I think that the failures are not related to my changes. Can you confirm please?
Thanks

ailzhang

Thanks!

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2020-02-25T16:12:13Z

Thanks, xla failures are probably not related to your changes, but @ailzhang would know more.

ailzhang · 2020-02-25T16:14:31Z

@ngimel The current failed tests are quantization tests ;)

facebook-github-bot · 2020-02-26T03:13:38Z

@ngimel merged this pull request in a836c4c.

Summary: Fixes an issue with `cdist` backward calculation for large inputs for the euclidean case. The grid size when launching the kernel exceeded the 2^16 limit for the second dimension, resulting in `RuntimeError: CUDA error: invalid configuration argument` Code to reproduce: ``` h, w, d = 800, 1216, 12 n = 133 A = torch.randn(n, d).cuda() B = torch.randn(h, w, d).cuda() A.requires_grad = True B.requires_grad = True B = B.reshape(-1, d).contiguous() dist = torch.cdist(A, B) loss = dist.sum() loss.backward() ``` Thanks to tkerola for the bug report, reproduction and suggesting a solution. Pull Request resolved: #31167 Differential Revision: D20035605 Pulled By: ngimel fbshipit-source-id: ae28ba4b549ee07a8bd937bb1de2438dc24eaa17

Summary: Fixes an issue with `cdist` backward calculation for large inputs for the euclidean case. The grid size when launching the kernel exceeded the 2^16 limit for the second dimension, resulting in `RuntimeError: CUDA error: invalid configuration argument` Code to reproduce: ``` h, w, d = 800, 1216, 12 n = 133 A = torch.randn(n, d).cuda() B = torch.randn(h, w, d).cuda() A.requires_grad = True B.requires_grad = True B = B.reshape(-1, d).contiguous() dist = torch.cdist(A, B) loss = dist.sum() loss.backward() ``` Thanks to tkerola for the bug report, reproduction and suggesting a solution. Pull Request resolved: pytorch#31167 Differential Revision: D20035605 Pulled By: ngimel fbshipit-source-id: ae28ba4b549ee07a8bd937bb1de2438dc24eaa17

connorlee77 · 2020-03-04T21:02:15Z

How can I update my version of torch to get this change?

ngimel · 2020-03-04T21:05:09Z

You can get nightly packages following instructions on pytorch.org.

RuABraun · 2021-02-02T15:00:40Z

I'm still getting an error with pytorch 1.7.1 and nightly

Hard to reproduce though (it definitely is because of cdist, but I can't reproduce it when I create a 5-line example)

emcastillo changed the title ~~Change cdist kernel grid to avoid CUDA error~~ Change cdist kernel grid parameter to avoid CUDA invalid configuration error Dec 12, 2019

tkerola reviewed Dec 12, 2019

View reviewed changes

aten/src/ATen/native/cuda/DistanceKernel.cu Outdated Show resolved Hide resolved

tkerola reviewed Dec 12, 2019

View reviewed changes

aten/src/ATen/native/cuda/DistanceKernel.cu Outdated Show resolved Hide resolved

tkerola reviewed Dec 12, 2019

View reviewed changes

emcastillo force-pushed the fix_cdist_backward branch from 4ed44d5 to 2e1aa18 Compare December 12, 2019 05:20

emcastillo force-pushed the fix_cdist_backward branch 3 times, most recently from 108d618 to bc353c4 Compare December 17, 2019 09:07

emcastillo mentioned this pull request Dec 17, 2019

torch.cdist raises CUDA error on backward with too big batch #27209

Closed

zou3519 added the open source label Dec 17, 2019

ngimel self-requested a review December 20, 2019 03:06

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 20, 2019

ngimel requested changes Dec 20, 2019

View reviewed changes

ptrblck mentioned this pull request Dec 24, 2019

Use int64 in pdist kernel to handle batches >= 46342 #30583 #31593

Closed

ngimel mentioned this pull request Dec 24, 2019

cdist backward for euclidian distance should be expressed through matrix multiplies #31599

Closed

Zhaoyi-Yan mentioned this pull request Jan 7, 2020

Some notes on the comment in kernel.py cornellius-gp/gpytorch#1010

Closed

emcastillo force-pushed the fix_cdist_backward branch from bc353c4 to f9154c3 Compare January 27, 2020 06:23

emcastillo requested review from apaszke, mrshenli, pietern and pritamdamania87 as code owners January 27, 2020 06:23

Emilio Castillo added 4 commits February 25, 2020 01:39

Don't use manual backward kernels for cdist with p=2

56f7a99

Add test for large case

8e47f2e

fix test_torch.py

7160b01

Remove device check on test_torch cdist tests

7ab0f56

emcastillo force-pushed the fix_cdist_backward branch from e12b25e to 7ab0f56 Compare February 25, 2020 01:40

ailzhang approved these changes Feb 25, 2020

View reviewed changes

facebook-github-bot reviewed Feb 25, 2020

View reviewed changes

facebook-github-bot closed this in a836c4c Feb 26, 2020

emcastillo deleted the fix_cdist_backward branch February 26, 2020 02:42

facebook-github-bot added the merged label Feb 26, 2020

This was referenced Apr 16, 2020

[Distance functions] F.pdist backward CUDA invalid configuration #25045

Open

cdist performance improvement for euclidean distance #25799

Closed

emcastillo mentioned this pull request Apr 27, 2020

torch.cdist produces nan gradients in Pytorch 1.5, but not Pytorch 1.4 #37154

Closed

mruberry added the Merged label Oct 28, 2020

wanyu2018umac mentioned this pull request Dec 29, 2020

CUDA error: invalid configuration argument during backward through torch.cdist #49928

Closed

ezyang mentioned this pull request Mar 16, 2021

cdist skip for manual backward is kind of janky #54096

Open

Skip manual backward for cdist with case p=2 #31167

Skip manual backward for cdist with case p=2 #31167

Uh oh!

Conversation

emcastillo commented Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkerola commented Dec 12, 2019

Uh oh!

Uh oh!

Uh oh!

tkerola left a comment

Choose a reason for hiding this comment

Uh oh!

kostmo commented Dec 12, 2019 • edited by dr-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Detailed failure analysis

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

pytorch_linux_xenial_cuda10_1_cudnn7_py3_NO_AVX_NO_AVX2_test (2/2)

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

ptrblck commented Dec 24, 2019

Uh oh!

emcastillo commented Dec 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Dec 26, 2019

Uh oh!

emcastillo commented Jan 5, 2020

Uh oh!

ngimel commented Jan 6, 2020

Uh oh!

Zhaoyi-Yan commented Jan 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Jan 23, 2020

Uh oh!

ngimel commented Feb 24, 2020

Uh oh!

ailzhang commented Feb 25, 2020 via email

Uh oh!

emcastillo commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emcastillo commented Feb 25, 2020

Uh oh!

ailzhang left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Feb 25, 2020

Uh oh!

ailzhang commented Feb 25, 2020

Uh oh!

facebook-github-bot commented Feb 26, 2020

Uh oh!

connorlee77 commented Mar 4, 2020

Uh oh!

ngimel commented Mar 4, 2020

Uh oh!

RuABraun commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Skip manual backward for `cdist` with case `p=2` #31167

Skip manual backward for `cdist` with case `p=2` #31167

emcastillo commented Dec 12, 2019 •

edited

Loading

kostmo commented Dec 12, 2019 •

edited by dr-ci bot

Loading

emcastillo commented Dec 25, 2019 •

edited

Loading

Zhaoyi-Yan commented Jan 6, 2020 •

edited

Loading

emcastillo commented Feb 25, 2020 •

edited

Loading

RuABraun commented Feb 2, 2021 •

edited

Loading