Skip to content

Fix random errors in batched SVD#4690

Merged
mergify[bot] merged 9 commits intocupy:masterfrom
leofang:fix_batched_svd
Feb 22, 2021
Merged

Fix random errors in batched SVD#4690
mergify[bot] merged 9 commits intocupy:masterfrom
leofang:fix_batched_svd

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Feb 19, 2021

Close #4684. Close #4687.

Fix 3 bugs:

  • The workspace size is not large enough
  • The pointer address should be computed differently based on full_matrices
  • Force C++11 compilation to work with older gcc (like gcc 5)

Remove the skip on CUDA 10.0 to force the CI to test it out, as gesvd shouldn't be affected by the known cuSOLVER bug.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 19, 2021

Jenkins, test this please

1 similar comment
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 19, 2021

Jenkins, test this please

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 8503cc8, target branch master) failed with status FAILURE.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 19, 2021

18k+ errors?! wtf...

Jenkins, test this please

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 8503cc8, target branch master) failed with status FAILURE.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

I think I know where went wrong. It has to do with pointer arithmetics. The matrix A is overwritten by gesvd, which was not taken into account when shifting the pointers in the loop. Will try to confirm over the weekend.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

Jenkins, test this please

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

Well, still no luck, but at least by allocating a larger workspace I can also generate some errors locally.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 4379d2b, target branch master) failed with status FAILURE.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

Jenkins, test this please

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

Jenkins, test this please

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

d56ee3a should work...

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

@emcastillo I will run d56ee3a through the CI twice, remove all unnecessary changes (stream, batched workspace, etc), and then run the CI for a few times. Hopefully this will be ready by Monday.

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit d56ee3a, target branch master) succeeded!

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

Jenkins, test this please

@leofang leofang changed the title [WIP][DO NOT MERGE] Fix random errors in batched SVD Fix random errors in batched SVD Feb 20, 2021
@leofang leofang marked this pull request as ready for review February 20, 2021 22:02
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 20, 2021

Still waiting for the CI to launch...

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit d56ee3a, target branch master) succeeded!

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 21, 2021

Test with all the unnecessary (I hope) codes removed.

Jenkins, test this please

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 36c2bbd, target branch master) succeeded!

@emcastillo
Copy link
Copy Markdown
Member

Jenkins, test this please

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 36c2bbd, target branch master) succeeded!

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 21, 2021

Jenkins, test this please

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 21, 2021

@kmaehashi @toslunar @emcastillo This is ready.

@emcastillo
Copy link
Copy Markdown
Member

Jenkins, test this please

@emcastillo
Copy link
Copy Markdown
Member

Running in daily shuffles
https://jenkins.preferred.jp/job/chainer/job/daily_master_cupy-shuffled/

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 36c2bbd, target branch master) succeeded!

@kmaehashi kmaehashi added blocking Issue/pull-request is mandatory for the upcoming release cat:bug Bugs prio:high labels Feb 22, 2021
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 22, 2021

Thanks, @emcastillo! All CI are green on test_decomposition.py! (A few CIs timed out before reaching 100% completion, but I suppose it's ok.)

Comment on lines +302 to +303
# TODO(leofang): the current approach may be memory hungry, try
# setting either job_u or job_vt to 'O' to overwrite the input?
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me revisit this after the release...

@emcastillo emcastillo added this to the v9.0.0b3 milestone Feb 22, 2021
@emcastillo emcastillo added the st:test-and-merge (deprecated) Ready to merge after test pass. label Feb 22, 2021
@mergify mergify bot merged commit 73e939e into cupy:master Feb 22, 2021
@leofang leofang deleted the fix_batched_svd branch February 22, 2021 05:59
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Feb 22, 2021

Thanks @emcastillo @kmaehashi @toslunar! Sorry for not identifying the bugs sooner...

@leofang leofang mentioned this pull request Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocking Issue/pull-request is mandatory for the upcoming release cat:bug Bugs prio:high st:test-and-merge (deprecated) Ready to merge after test pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Batched SVD test failures

4 participants