sageWithDoc: python 3.12 fixes#323426
Conversation
dc04e82 to
edc3e6d
Compare
c2c040d to
31f320d
Compare
| + | ||
| +cdef extern from *: | ||
| + """ | ||
| + #pragma GCC optimize("O1") |
There was a problem hiding this comment.
This is a big hack, but Sage is missing a lot of volatile variable qualifiers and sagemath/sage#37951 wasn't enough to stop the crashes with Python 3.12 and the newest GCC. I will keep investigating this, but this seems like an acceptable compromise for now. I don't think clang implements the exact same optimizations, so a GCC-only fix seems OK.
| lrcalc-python | ||
| matplotlib | ||
| memory-allocator | ||
| meson-python |
There was a problem hiding this comment.
I think numpy should be propagating this because it is required to use numpy.f2py (see https://numpy.org/doc/stable/f2py/buildtools/distutils-to-meson.html), but in this case I think it should also make the ninja binary accessible. In the meantime, I will add meson-python here and ninja to Sage's runtime path.
(cc @mweinelt)
7c6f434c
left a comment
There was a problem hiding this comment.
Looks good on reading through
|
Thanks for the review! |
Description of changes
gcc 13.3.0 seems to exploit some undefined behaviour around setjmp/longjmp to do some optimizations that completely break Sage's GAP interface, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114872 for the upstream investigation and https://trofi.github.io/posts/312-the-sagemath-saga.html for @trofi's analysis. This only affects Python 3.12.
Unfortunately, the minimal upstream patch at sagemath/sage#37951, which was supposed to address the issue, is not enough to fix the issue on Nixpkgs: GCC still emits code in which an always-null pointer is dereferenced with no null checks. Arch is also seeing crashes even with the upstream patch applied. As a stopgap, we compile the relevant file with
-O1(seegap-element-crash.patch)I also tried other patches similar to sagemath/sage#37951, like having volatile
v0,v1andv2variables and populating the relevant variables in the 1-arg, 2-arg and 3-arg cases (see here), but that makes crashes more frequent, presumably GCC can then easily infer that__pyx_t_4(a temporary used in Cython-generated code to hold expressions of the form<GapElement>a[i]) is always non-null when jumping to the error handling cleanup code without longjmp. It then removes the null check, but it also performs other optimizations which notice that__pyx_t_4is always zero. The case where we jump to the error handling code after longjmp, which is what Sage tests, is undefined behaviour and leads to a crash.Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.