Implement cupy.linalg.eig / cupy.linalg.eigvals clone of PR #8854#8980
Implement cupy.linalg.eig / cupy.linalg.eigvals clone of PR #8854#8980
Conversation
Check version range
|
/test mini |
|
@mfoerste4 found that the OOM issue is due to unguarded calls to the new cuSOLVER routines, which is only available in recent CUDA 12.x (which one I forgot, it's in release notes and the linked issue). The evidence is CUDA 12.8 CIs passed. We need to add a version guard like what we did in the past for other cuSOLVER routines. |
|
/test mini |
|
@mfoerste4 Thank you for taking over the pull request! Considering the
|
Looks like it's fixed now! (The remaining CI failures are irrelevant.)
@mfoerste4 could you kindly take a look at Akifumi-san's comments? |
|
@mfoerste4 @leofang Sorry for making you wait long (was very busy recently). Please keep me up to date! |
|
@asi1024 , thanks for your response, comments inline
I don't think we can make any assumptions about these in general. For stacked computations you are probably right.
Yes, I will add that
I don't think that is possible. IIUC we need each matrix in Fortran format, but each matrix should remain coalesced in memory. In strides for a (B x M x M) input this would be (M*M, 1, M) which is not Fortran style. So I guess the best we can do is the local transpose for each matrix in _geev (as we need a copy anyways as cusolver might overwrite)
The stacking for multiple geev computes seems to be broken in the current PR - the tests only cover 2D. For pre-allocation the same issue with the data layout arises as mentioned above. As this is my first contact with cupy - if you have a proposal on how to implement the stacking I would be happy to hear it. |
|
@asi1024 , I moved the batching into the inner loop and used the pre-allocated output(s) as much as possible. For real input we still need a local instance for real eigenvectors because of the way the cusolver API is designed. |
|
@asi1024 kindly ping |
|
It's great to see this, I'm looking forward to be able to use it! For the dtype of eigenvalues: it can be argued that numpy's decision to have value-dependent dtype is a mistake. Other array libraries (notably, jax.numpy, pytorch) always return eigenvalues as a complex array, and it's up to a user to decide if they want to check for zero imaginary parts. |
|
@asi1024 , I don't think there are any open items left. Would you mind triggering the main CI tests? |
|
/test mini |
Our team concluded that it is acceptable to avoid changing the output tensor's dtype based on the output value. It is reasonable in the sense of avoiding DtoH synchronization. |
|
Could you skip eiv/eivgals tests on older CUDA using this? @pytest.mark.skipif(
cupy.cuda.runtime.runtimeGetVersion() < 12060, reason='Requires CUDA 12.6+')Also let's uncomment these two lines to list these APIs on docs: cupy/docs/source/reference/linalg.rst Lines 62 to 64 in 380e629 |
|
This pull request is now in conflicts. Could you fix it @mfoerste4? 🙏 |
|
Thanks @kmaehashi for the review. I have made the requested changes. Please note that the tests had to be adjusted to account for cupy now always returning complex types. |
|
Thanks @mfoerste4! The change looks good to me, let me kick the CI again. /test mini |
asi1024
left a comment
There was a problem hiding this comment.
Sorry for my late review. LGTM!
This is PR #8854 with an updated main branch
CC @leofang