-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
ENH: Make FPE blas check a runtime check for all apple arm systems #30102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If an arm system uses SME some BLAS versions just set FPEs spuriously. The culprit is really Accelerate so we might limit this to Mac OS as well (and then with an SME check also -- see changes prior to this commit). However, some OpenBLAS versions also caused this, although OpenBLAS is likely to clear the FPEs on their side. Closes numpygh-29820
numpy/_core/src/common/blas_utils.h
Outdated
| * This checks always on all ARM (it is a small check overall). | ||
| */ | ||
| #if defined(__APPLE__) && defined(__aarch64__) && defined(ACCELERATE_NEW_LAPACK) | ||
| #if defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what to put here, so went with the broad __aarch64__. We could add __APPLE__ if we think this is largely Accelerate only (i.e. OpenBLAS considered this a bug on their side, but for now I am not sure...).
It's not like it is utterly terrible if we have the runtime check for the global...
@mattip can you test it on an M4, my M1 hits the checks, but wouldn't actually trigger the spurious FPEs.
numpy/_core/src/common/blas_utils.c
Outdated
| #endif | ||
| CBLAS_FUNC(cblas_dgemm)( | ||
| CblasRowMajor, CblasNoTrans, CblasNoTrans, 20, 20, 20, 1., | ||
| x, 20, y, 20, 0., res, 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the jobs segfaulted (previous run), did I get these parameters wrong?
|
Why do the check in C when we can do it in python in |
|
No reason, I think I just ended there because it seemed like the smaller change to keep some of the logic in-place to skip it on unaffected machines (and a single call is simple enough, although something is off on windows!?). Happy to just move it to Python fully as well. |
|
I think it would be better in python, no need to worry about calling conventions and allocations. You could even limit the check to |
|
OK, let me just move it to Python.
I first had that, but then read this:
|
|
Ahh, oops, right. Here is what the M4 reports: |
Which of these do you actually want to add? Right now it's hardcoded to just ARM on the C side, although because of that I am also not sure I really prefer moving this to Python. Of course now it would be easier to add this, but I am not sure what (or if it even matters if we limit to ARM already). |
7fccc18 to
4c6f597
Compare
4c6f597 to
7c8625e
Compare
|
The circleci failure is unrelated. |
mattip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could _blas_supports_fpe() (called with no argument) return the current state? Then
- I would prefer we not blanket-disable FPEs on ARM64, and use the import test to disable it if needed.
- add the current state of
_blas_supports_fpe()toshow_runtime().
| x @ x | ||
| except FloatingPointError: | ||
| res = _core._multiarray_umath._blas_supports_fpe(False) | ||
| if res: # res was not modified (hardcoded to True for now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res is not modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_blas_supports_fpe
I made it _blas_supports_fpe(None) just to keep the C code a very bit simpler, didn't seem useful enough to care?
res is the state after changing it. If we are not on an ARM machine, it will still be True.
add the current state of _blas_supports_fpe() to show_runtime().
Hmmm, yeah that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the show runtime
I would prefer we not blanket-disable FPEs on ARM64, and use the import test to disable it if needed.
Sorry, but what do you mean? On non ARM64, it's blanket enabled. On ARM64 it's disabled based on this test. I have not bothered to skip this test on non-ARM64, instead it might give that warning when things seem off.
|
Is this ready? I need to make a decision about backporting it. |
|
I suspect so, unless you/Matti have thoughts on it. Unfortunately, I can't confirm for sure that it works :(. |
|
I'll probably put this in, but wait for @mattip tomorrow. I do wonder why arm did things that way for sme, too hard? |
|
Did we have reports of aarch64 (linux) emitting FPEs? I thought the problem was only on macOS. |
Co-authored-by: Matti Picus <[email protected]>
Not yet at least, I thought openblas may well be broken on others also (But I guess openblas may work around it). |
|
Thanks @seberg |
If an arm system uses SME some BLAS versions just set FPEs spuriously. The culprit is really Accelerate so we might limit this to Mac OS as well (and then with an SME check also -- see changes prior to this commit).
However, some OpenBLAS versions also caused this, although OpenBLAS is likely to clear the FPEs on their side.
Closes gh-29820