Skip to content

Conversation

@skrah
Copy link
Contributor

@skrah skrah commented May 28, 2019

Fixes #18275.

@skrah
Copy link
Contributor Author

skrah commented May 29, 2019

@pytorchbot retest this please.

@skrah skrah changed the title [WIP] Add a 'dim' argument to nuclear norm Add a 'dim' argument to nuclear norm May 30, 2019
@skrah
Copy link
Contributor Author

skrah commented May 30, 2019

@soumith @vishwakftw I've implemented the 'dim' argument for nuclear norm (#18275).

As @vishwakftw has mentioned, this requires a batch_svd() implementation. I've added a simple one that has about the same performance characteristics as np.linalg.svd().

There are some open issues about optimized batch_svd() implementations. Perhaps it would be easier to proceed with those after at::svd() has been ported to ATen. So I left the batch_svd() implementation as a static function exclusively for nuclear_norm().

@skrah skrah requested a review from soumith May 30, 2019 11:58
@vishwakftw
Copy link
Contributor

vishwakftw commented May 30, 2019

I’ll start working on implementing batch SVD in a week or two. It would be safe to mark the batch_svd function as internal (with an underscore preceding the name), but I don’t think its exposed in Python in this PR anyways.

@ezyang
Copy link
Contributor

ezyang commented Jun 6, 2019

@vishwakftw If you think this is good to go I can merge it

@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 6, 2019
@vishwakftw
Copy link
Contributor

@ezyang I should be able to take a look at this tomorrow (it’s a bit late back here). Hope that’s fine.

Copy link
Contributor

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. I noticed that there are no CUDA tests available, could you add them too?

@skrah
Copy link
Contributor Author

skrah commented Jun 8, 2019

@pytorchbot retest this please.

1 similar comment
@skrah
Copy link
Contributor Author

skrah commented Jun 9, 2019

@pytorchbot retest this please.

Copy link
Contributor

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except a few nits. Once those are addressed, this should be good to land.

perm.push_back(i);
}

TORCH_CHECK(perm.size() == ndim,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat check!

@pytorchbot pytorchbot added the module: cuda Related to torch.cuda, and CUDA support in general label Jun 9, 2019
def _test_nuclear_norm_axes(self, device='cpu'):
def check_single_nuclear_norm(x, axes):
if x.is_cuda and randrange(100) < 95:
return # too many cpu <==> gpu copies
Copy link
Contributor

@vishwakftw vishwakftw Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of randrange?

If you are concerned about too many copies you can create a tensor on the CPU and pass it to this function, and then create a copy of the tensor on the device that you are testing for explicitly.

Something like this:

x_device = x.to(device=device, copy=True) if device !=cpuelse x # x is a CPU tensor invariably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are too slow, so some are skipped (they all pass here on my machine without randrange).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that the CUDA tests spend the majority of the time copying from GPU ==> CPU so that numpy arrays can be created and the results can be compared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there are three copies when x is already on the GPU:

  1. For creating the numpy array.

  2. For copying the first result to CPU for allclose().

  3. For copying the second result to CPU for allclose().

One can reduce it to two copies by copying expected to GPU, but it would probably not be much better.

self.assertTrue(np.allclose(ans.cpu(), expected, rtol=1e-04, atol=1e-04))

for n in range(1, 5):
for m in range(1, 5):
Copy link
Contributor

@vishwakftw vishwakftw Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these ranges create too many test cases which is slowing down the tests. In my opinion: three kinds of test cases should suffice:

  • square matrices
  • tall matrices
  • fat matrices

Square matrices could be of dimensions (3, 3), fat matrices could be of dimensions (3, 5) and tall matrices could be of dimensions (5, 3). In addition to these, we could have at most 2 batch dimensions (5, *, *) and (7, 5, *, *). This comes up to 9 test cases. In you case there are (2 + 4) * (4 * 4) = 96 test cases (just considering the sizes of the tensors).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a release build, probably including pytest test collection:

CPU

$ /home/stefan/rel2/bin/python3 -m pytest test_torch.py -k test_sum_dim -v
...
1 passed, 494 deselected in 19.41 seconds
$ /home/stefan/rel2/bin/python3 -m pytest test_torch.py -k test_nuclear_norm -v
...
2 passed, 493 deselected in 2.05 seconds

CUDA

$ /home/stefan/rel2/bin/python3 -m pytest test_cuda.py -k test_det_logdet_slogdet -v
...
1 passed, 169 deselected in 6.37 seconds

95% random skip (with 98% random skip it is 4.74s):

/home/stefan/rel2/bin/python3 -m pytest test_cuda.py -k nuclear_norm -v
...
2 passed, 168 deselected in 7.07 seconds 

So CPU is still relatively fast. On CUDA it definitely is one of the slower tests, but not completely an outlier. The benefit of brute force is that it catches issues like:

#20452

Are random skips not allowed in the tests? In the CPython test suite they are allowed, but practices differ of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think random skips are allowed. However there is a @slowtest decorator available, introduced in #18231

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang do you have any suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 58c000c the CPU tests take 0.7s and the CUDA tests (with 95% skips) 2.4s.

Copy link
Contributor

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks neat, thank you for this. I’ll wait for @ezyang to give his thoughts about randomly skipping tests.

@ezyang
Copy link
Contributor

ezyang commented Jun 10, 2019 via email

@ezyang
Copy link
Contributor

ezyang commented Jun 10, 2019

I'm OK with landing this straight up

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@skrah
Copy link
Contributor Author

skrah commented Jun 10, 2019

Yes, please use slowTest :)

Fine, next time. :)

Thanks @vishwakftw for reviewing and @ezyang for landing this!

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 10, 2019
Summary:
Addresses #18275.
Pull Request resolved: pytorch/pytorch#21022

Differential Revision: D15743515

Pulled By: ezyang

fbshipit-source-id: e4aaea0bd7f863a2abad45c4322d6a9fb02a88e3
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 8b9b215.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.norm does not work when p="nuc"

6 participants