-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add a 'dim' argument to nuclear norm #21022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pytorchbot retest this please. |
|
@soumith @vishwakftw I've implemented the 'dim' argument for nuclear norm (#18275). As @vishwakftw has mentioned, this requires a There are some open issues about optimized |
|
I’ll start working on implementing batch SVD in a week or two. It would be safe to mark the batch_svd function as internal (with an underscore preceding the name), but I don’t think its exposed in Python in this PR anyways. |
|
@vishwakftw If you think this is good to go I can merge it |
|
@ezyang I should be able to take a look at this tomorrow (it’s a bit late back here). Hope that’s fine. |
vishwakftw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. I noticed that there are no CUDA tests available, could you add them too?
|
@pytorchbot retest this please. |
1 similar comment
|
@pytorchbot retest this please. |
vishwakftw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except a few nits. Once those are addressed, this should be good to land.
| perm.push_back(i); | ||
| } | ||
|
|
||
| TORCH_CHECK(perm.size() == ndim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat check!
| def _test_nuclear_norm_axes(self, device='cpu'): | ||
| def check_single_nuclear_norm(x, axes): | ||
| if x.is_cuda and randrange(100) < 95: | ||
| return # too many cpu <==> gpu copies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of randrange?
If you are concerned about too many copies you can create a tensor on the CPU and pass it to this function, and then create a copy of the tensor on the device that you are testing for explicitly.
Something like this:
x_device = x.to(device=device, copy=True) if device != ‘cpu’ else x # x is a CPU tensor invariably.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are too slow, so some are skipped (they all pass here on my machine without randrange).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think that the CUDA tests spend the majority of the time copying from GPU ==> CPU so that numpy arrays can be created and the results can be compared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently there are three copies when x is already on the GPU:
-
For creating the numpy array.
-
For copying the first result to CPU for
allclose(). -
For copying the second result to CPU for
allclose().
One can reduce it to two copies by copying expected to GPU, but it would probably not be much better.
test/test_torch.py
Outdated
| self.assertTrue(np.allclose(ans.cpu(), expected, rtol=1e-04, atol=1e-04)) | ||
|
|
||
| for n in range(1, 5): | ||
| for m in range(1, 5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these ranges create too many test cases which is slowing down the tests. In my opinion: three kinds of test cases should suffice:
- square matrices
- tall matrices
- fat matrices
Square matrices could be of dimensions (3, 3), fat matrices could be of dimensions (3, 5) and tall matrices could be of dimensions (5, 3). In addition to these, we could have at most 2 batch dimensions (5, *, *) and (7, 5, *, *). This comes up to 9 test cases. In you case there are (2 + 4) * (4 * 4) = 96 test cases (just considering the sizes of the tensors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a release build, probably including pytest test collection:
CPU
$ /home/stefan/rel2/bin/python3 -m pytest test_torch.py -k test_sum_dim -v
...
1 passed, 494 deselected in 19.41 seconds
$ /home/stefan/rel2/bin/python3 -m pytest test_torch.py -k test_nuclear_norm -v
...
2 passed, 493 deselected in 2.05 seconds
CUDA
$ /home/stefan/rel2/bin/python3 -m pytest test_cuda.py -k test_det_logdet_slogdet -v
...
1 passed, 169 deselected in 6.37 seconds
95% random skip (with 98% random skip it is 4.74s):
/home/stefan/rel2/bin/python3 -m pytest test_cuda.py -k nuclear_norm -v
...
2 passed, 168 deselected in 7.07 seconds
So CPU is still relatively fast. On CUDA it definitely is one of the slower tests, but not completely an outlier. The benefit of brute force is that it catches issues like:
Are random skips not allowed in the tests? In the CPython test suite they are allowed, but practices differ of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think random skips are allowed. However there is a @slowtest decorator available, introduced in #18231
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 58c000c the CPU tests take 0.7s and the CUDA tests (with 95% skips) 2.4s.
vishwakftw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks neat, thank you for this. I’ll wait for @ezyang to give his thoughts about randomly skipping tests.
|
Yes, please use slowTest :)
Excerpts from Vishwak Srinivasan's message of 2019-06-10 04:18:30 -0700:
… vishwakftw commented on this pull request.
> + expected = np.linalg.norm(a, "nuc", axis=axes)
+
+ ans = torch.norm(x, "nuc", dim=axes)
+ self.assertTrue(ans.is_contiguous())
+ self.assertEqual(ans.shape, expected.shape)
+ self.assertTrue(np.allclose(ans.cpu(), expected, rtol=1e-04, atol=1e-04))
+
+ out = torch.zeros(expected.shape, dtype=x.dtype, device=x.device)
+ ans = torch.norm(x, "nuc", dim=axes, out=out)
+ self.assertIs(ans, out)
+ self.assertTrue(ans.is_contiguous())
+ self.assertEqual(ans.shape, expected.shape)
+ self.assertTrue(np.allclose(ans.cpu(), expected, rtol=1e-04, atol=1e-04))
+
+ for n in range(1, 5):
+ for m in range(1, 5):
@ezyang do you have any suggestions?
|
|
I'm OK with landing this straight up |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Fine, next time. :) Thanks @vishwakftw for reviewing and @ezyang for landing this! |
Summary: Addresses #18275. Pull Request resolved: pytorch/pytorch#21022 Differential Revision: D15743515 Pulled By: ezyang fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
Fixes #18275.