Fix random errors in batched SVD#4690
Conversation
|
Jenkins, test this please |
1 similar comment
|
Jenkins, test this please |
|
Jenkins CI test (for commit 8503cc8, target branch master) failed with status FAILURE. |
|
18k+ errors?! wtf... Jenkins, test this please |
|
Jenkins CI test (for commit 8503cc8, target branch master) failed with status FAILURE. |
|
I think I know where went wrong. It has to do with pointer arithmetics. The matrix A is overwritten by gesvd, which was not taken into account when shifting the pointers in the loop. Will try to confirm over the weekend. |
|
Jenkins, test this please |
|
Well, still no luck, but at least by allocating a larger workspace I can also generate some errors locally. |
|
Jenkins CI test (for commit 4379d2b, target branch master) failed with status FAILURE. |
|
Jenkins, test this please |
|
Jenkins, test this please |
|
d56ee3a should work... |
|
@emcastillo I will run d56ee3a through the CI twice, remove all unnecessary changes (stream, batched workspace, etc), and then run the CI for a few times. Hopefully this will be ready by Monday. |
|
Jenkins CI test (for commit d56ee3a, target branch master) succeeded! |
|
Jenkins, test this please |
|
Still waiting for the CI to launch... |
|
Jenkins CI test (for commit d56ee3a, target branch master) succeeded! |
|
Test with all the unnecessary (I hope) codes removed. Jenkins, test this please |
|
Jenkins CI test (for commit 36c2bbd, target branch master) succeeded! |
|
Jenkins, test this please |
|
Jenkins CI test (for commit 36c2bbd, target branch master) succeeded! |
|
Jenkins, test this please |
|
@kmaehashi @toslunar @emcastillo This is ready. |
|
Jenkins, test this please |
|
Jenkins CI test (for commit 36c2bbd, target branch master) succeeded! |
|
Thanks, @emcastillo! All CI are green on |
| # TODO(leofang): the current approach may be memory hungry, try | ||
| # setting either job_u or job_vt to 'O' to overwrite the input? |
There was a problem hiding this comment.
Let me revisit this after the release...
|
Thanks @emcastillo @kmaehashi @toslunar! Sorry for not identifying the bugs sooner... |
Close #4684. Close #4687.
Fix 3 bugs:
full_matricesRemove the skip on CUDA 10.0 to force the CI to test it out, as gesvd shouldn't be affected by the known cuSOLVER bug.