Skip to content

Conversation

@vishwakftw
Copy link
Contributor

@vishwakftw vishwakftw commented Sep 3, 2018

  • Added a note to the doc string for svd.

@vishwakftw
Copy link
Contributor Author

I have got some times with me (10 runs w/ 10 loops each)

dimensions before this patch after this patch
1536 x 1536 4.99 s +/- 12.6 ms per loop 1.54 s +/- 11 ms per loop
1024 x 1536 3.81 s +/- 9.69 ms per loop 567 ms +/- 21.6 ms per loop
1536 x 1024 2.66 s +/- 6.61 ms per loop 568 ms +/- 27 ms per loop
999 x 999 1.55 s +/- 3.83 ms per loop 347 ms +/- 2.68 ms per loop
999 x 1729 1.98 s +/- 11.5 ms per loop 547 ms +/- 27.5 ms per loop
1729 x 729 994 ms +/- 9.27 ms per loop 210 ms +/- 2.75 ms per loop

@ssnl @soumith does this look good enough?

@zou3519
Copy link
Contributor

zou3519 commented Sep 4, 2018

The performance speedup is great, @vishwakftw! Could you benchmark some smaller matrices? In particular, I'm reading https://groups.google.com/forum/#!topic/julia-dev/mmgO65i6-fA and from their experience it sounds like there is some value in having a second svd call that uses gesvd, but I'm curious what your benchmarking shows about that.

edit: Oh, I found the discussion in #11174 where @soumith linked to the link I sent. Please ignore my comment :)

@vishwakftw
Copy link
Contributor Author

I believe this reiterates the importance of shifting to gesdd

Yes, the LAPACK routine dgesdd uses a Divide and Conquer approach at its core, which recurses on smaller and smaller problems; at the smallest level, it uses a QR algorithm which is identical to that of dgesvd.
The only real reason to use dgesvd instead is for accuracy and workspace reasons. The former should only be important if you care about high relative accuracy for tiny singular values

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good except for 1 nit

.. note:: The implementation of SVD on CPU uses the LAPACK routine `?gesdd` (a divide-and-conquer
algorithm) instead of `?gesvd` for speed. However, the SVD on GPU uses the MAGMA routine
`gesvd` since it takes up lesser workspace area.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@vishwakftw
Copy link
Contributor Author

I have got some more times with me (10 runs w/ 1000 loops each)

dimensions before this patch after this patch
200 x 100 3.3 ms +/- 31.7 us per loop 2.09 ms +/- 190 us per loop
100 x 50 1.01 ms +/- 1.92 us per loop 634 us +/- 10.9 us per loop
50 x 25 212 us +/- 10.6 us per loop 215 us +/- 1.8 us per loop
40 x 20 126 us +/- 1.53 us per loop 115 us +/- 2.37 us per loop
20 x 10 46.8 us +/- 87.1 ns per loop 53.5 us +/- 7.91 us per loop
10 x 5 28.3 us +/- 395 ns per loop 33.9 us +/- 8.17 us per loop

cc: @zou3519

From this, as the matrix size starts diminishing, the advantage fades away.

@zou3519
Copy link
Contributor

zou3519 commented Sep 4, 2018

@vishwakftw those times look great

@yaroslavvb
Copy link
Contributor

yaroslavvb commented Sep 4, 2018

BTW, I would also check if the result has any NaN's. I found Intel's gesdd erroneously produced NaNs when gesvd didn't.

I feel like some of the edge cases are likely to come up on more ill-conditioned matrices. For instance here's a particular (very ill-conditioned) matrix that breaks TensorFlow's SVD routine

from scipy import linalg  # for svd
import urllib.request
import numpy as np

url="https://storage.googleapis.com/tensorflow-community-wheels/svd_in"
response = urllib.request.urlopen(url)
body = response.read()
print("Read %d bytes"%(len(body),))
assert len(body) == 15366400
open("svd_in", "wb").write(body)

dtype = np.float32
matrix0 = np.genfromtxt('svd_in',
                        delimiter= ",").astype(dtype)
assert matrix0.shape == (784, 784)
u, s, v = linalg.svd(matrix0)
print("matrix0 any NaNs: %s"% (np.isnan(matrix0).any(),))
print("u had NaNs: %s"% (np.isnan(u).any(),))

@vishwakftw
Copy link
Contributor Author

@ssnl is this good to go?

@vishwakftw
Copy link
Contributor Author

@yaroslavvb The example you have given seems to not break the current implementation in this PR. There were no NaNs or infs detected.

@vishwakftw
Copy link
Contributor Author

@pytorchbot retest this please

@vishwakftw
Copy link
Contributor Author

@zou3519 @ssnl is this good to go?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SsnL has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@vishwakftw vishwakftw deleted the gesvd-to-gesdd-cpu branch September 7, 2018 15:06
zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 7, 2018
Summary:
- Added a note to the doc string for `svd`.
Pull Request resolved: pytorch/pytorch#11194

Differential Revision: D9683250

Pulled By: soumith

fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f
PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
- Added a note to the doc string for `svd`.
Pull Request resolved: pytorch#11194

Differential Revision: D9683250

Pulled By: soumith

fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants