Skip to content

BUG: SIGABRT on using ThreadPoolExecutor with linalg.eigvalsh in v1.26.0b1 #24512

@lagru

Description

@lagru

Describe the issue:

In scikit-image, we have started to encounter unexpected crashes in numpy.linalg.eigvalsh when used via a ThreadPoolExecutor with NumPy 1.26.0b1.

I have now managed to reduce the reproducing example from scikit-image/scikit-image#6970 (comment) to one only using NumPy (see below and also scikit-image/scikit-image#7101 (comment)). That's why I am reasonably confident that the error might originate on NumPy's side.

Some additional observations:

Reproduce the code example:

import numpy as np
from concurrent.futures import ThreadPoolExecutor

assert np.__version__ == '1.26.0b1'

rng = np.random.default_rng(32)
matrices = (
    rng.random((5, 10, 10, 3, 3)),
    rng.random((5, 10, 10, 3, 3)),
    # rng.random((5, 10, 10, 3, 3)),
)

with ThreadPoolExecutor(max_workers=None) as ex:
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))

Error message:

The erratic behavior seems a bit unstable. Most of the time I get the free(): invalid pointer SIGABRT, but sometimes the Traceback below concerning the illegal value and very rarely no error at all. This seems to depend a bit on the size of the passed array and number of concurrent tasks?

Traceback (most recent call last):
  File "/home/lg/Res/scikit-image/local/debug-pr7101.py", line 14, in <module>
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lg/Res/scikit-image/local/debug-pr7101.py", line 14, in <lambda>
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1181, in eigvalsh
    w = gufunc(a, signature=signature, extobj=extobj)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: On entry to DSYEVD parameter number 8 had an illegal value

Runtime information:

Pre-build v1.26.0b1

1.26.0b1
3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]

[{'numpy_version': '1.26.0b1',
  'python': '3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429]',
  'uname': uname_result(system='Linux', node='hue', release='6.4.11-arch2-1', version='#1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Haswell',
  'filepath': '/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'}]

In-place build v1.26.0b1 from source

1.26.0b1
3.9.17 | packaged by conda-forge | (main, Aug 10 2023, 07:02:31)
[GCC 12.3.0]

[{'numpy_version': '1.26.0b1',
'python': '3.9.17 | packaged by conda-forge | (main, Aug 10 2023, '
'07:02:31) \n'
'[GCC 12.3.0]',
'uname': uname_result(system='Linux', node='hue', release='6.4.11-arch2-1', version='#1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL',
'AVX512_SPR']}},
{'architecture': 'Haswell',
'filepath': '/home/lg/.local/lib/micromamba/envs/numpy-dev/lib/libopenblasp-r0.3.23.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}]

gdb --args python local/debug-pr7101.py

debug-pr7101.py contains the minimal example above.

$ gdb --args python local/debug-pr7101.py
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...

This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /usr/bin/python3.11
Reading symbols from /home/lg/.cache/debuginfod_client/9efa8fdb1fce89c7a9f29802398a366b6c913a3e/debuginfo...
(gdb) r
Starting program: /home/lg/.local/lib/venv/skimagedev/bin/python local/debug-pr7101.py
Downloading separate debug info for /lib64/ld-linux-x86-64.so.2
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000
Downloading separate debug info for /usr/lib/libc.so.6
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libgfortran-040039e1.so.5.0.0
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/../../numpy.libs/libquadmath-96973f99.so.0.0.0
[New Thread 0x7ffff3bff6c0 (LWP 10872)]
[New Thread 0x7ffff33fe6c0 (LWP 10873)]
[New Thread 0x7ffff0bfd6c0 (LWP 10874)]
[New Thread 0x7fffec3fc6c0 (LWP 10875)]
[New Thread 0x7fffe9bfb6c0 (LWP 10876)]
[New Thread 0x7fffe73fa6c0 (LWP 10877)]
[New Thread 0x7fffe6bf96c0 (LWP 10878)]
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_tests.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/fft/_pocketfft_internal.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/mtrand.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/bit_generator.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_common.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_bounded_integers.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_mt19937.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_philox.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_pcg64.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_sfc64.cpython-311-x86_64-linux-gnu.so
Downloading separate debug info for /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/random/_generator.cpython-311-x86_64-linux-gnu.so
[New Thread 0x7fffe170d6c0 (LWP 10879)]
[New Thread 0x7fffe0f0c6c0 (LWP 10880)]
free(): invalid pointer
free(): invalid pointer

Thread 9 "python" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe170d6c0 (LWP 10879)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
at pthread_kill.c:44
Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff748e8a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>)
at pthread_kill.c:78
#2  0x00007ffff743e668 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff74264b8 in __GI_abort () at abort.c:79
#4  0x00007ffff7427390 in __libc_message (fmt=fmt@entry=0x7ffff75a3550 "%s\n")
at ../sysdeps/posix/libc_fatal.c:150
#5  0x00007ffff74987b7 in malloc_printerr (str=str@entry=0x7ffff75a102b "free(): invalid pointer")
at malloc.c:5765
#6  0x00007ffff749aa74 in _int_free (av=<optimized out>, p=<optimized out>,
have_lock=have_lock@entry=0) at malloc.c:4500
#7  0x00007ffff749d353 in __GI___libc_free (mem=<optimized out>) at malloc.c:3391
#8  0x00007fffe201d0a9 in void eigh_wrapper<double>(char, char, char**, long const*, long const*) [clone .constprop.0] ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/_umath_linalg.cpython-311-x86_64-linux-gnu.so
#9  0x00007ffff6a4cef5 in generic_wrapped_legacy_loop ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#10 0x00007ffff6a5b9ce in ufunc_generic_fastcall ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#11 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7ffff3fb3840, tstate=0x555555ae04c0)
at ./Include/internal/pycore_call.h:92
#12 PyObject_Vectorcall (callable=0x7ffff3fb3840, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#13 0x00007ffff79e4379 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:4773
#14 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce318, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#15 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7ffff76ce310, locals=0x0,
func=0x7fffe2369bc0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#16 _PyFunction_Vectorcall (func=0x7fffe2369bc0, stack=0x7ffff76ce310, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#17 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe2369bc0, tstate=0x555555ae04c0)
at ./Include/internal/pycore_call.h:92
#18 PyObject_Vectorcall (callable=0x7fffe2369bc0, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#19 0x00007ffff694407d in dispatcher_vectorcall ()
from /home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
#20 0x00007ffff79f20d7 in _PyObject_VectorcallTstate (kwnames=<optimized out>,
nargsf=<optimized out>, args=<optimized out>, callable=0x7fffe2362db0, tstate=0x555555ae04c0)
--Type <RET> for more, q to quit, c to continue without paging--
at ./Include/internal/pycore_call.h:92
#21 PyObject_Vectorcall (callable=0x7fffe2362db0, args=<optimized out>, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:299
#22 0x00007ffff79e4379 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:4773
#23 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce2b0, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#24 _PyEval_Vector (kwnames=<optimized out>, argcount=1, args=0x7fffe1755498, locals=0x0,
func=0x7ffff77984a0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#25 _PyFunction_Vectorcall (func=0x7ffff77984a0, stack=0x7fffe1755498, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#26 0x00007ffff79e7e77 in do_call_core (use_tracing=<optimized out>, kwdict=0x7fffe17706c0,
callargs=0x7fffe1755480, func=0x7ffff77984a0, tstate=<optimized out>) at Python/ceval.c:7356
#27 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:5380
#28 0x00007ffff7a0a9e0 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce188, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#29 _PyEval_Vector (kwnames=<optimized out>, argcount=4, args=0x7ffff6f2f528, locals=0x0,
func=0x7fffe176d3a0, tstate=0x555555ae04c0) at Python/ceval.c:6438
#30 _PyFunction_Vectorcall (func=0x7fffe176d3a0, stack=0x7ffff6f2f528, nargsf=<optimized out>,
kwnames=<optimized out>) at Objects/call.c:393
#31 0x00007ffff79e7e77 in do_call_core (use_tracing=<optimized out>, kwdict=0x7ffff77f3b40,
callargs=0x7ffff6f2f510, func=0x7fffe176d3a0, tstate=<optimized out>) at Python/ceval.c:7356
#32 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:5380
#33 0x00007ffff7a2c403 in _PyEval_EvalFrame (throwflag=0, frame=0x7ffff76ce020, tstate=0x555555ae04c0)
at ./Include/internal/pycore_ceval.h:73
#34 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=0x7fffe170ce28,
locals=0x0, func=0x7fffe1fe6d40, tstate=0x555555ae04c0) at Python/ceval.c:6438
#35 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, stack=0x7fffe170ce28,
func=0x7fffe1fe6d40) at Objects/call.c:393
#36 _PyObject_VectorcallTstate (tstate=0x555555ae04c0, callable=0x7fffe1fe6d40, args=0x7fffe170ce28,
nargsf=<optimized out>, kwnames=<optimized out>) at ./Include/internal/pycore_call.h:92
#37 0x00007ffff7a2c0c8 in method_vectorcall (method=<optimized out>,
args=0x7ffff7d6f6b0 <_PyRuntime+58928>, nargsf=<optimized out>, kwnames=0x0)
at Objects/classobject.c:67
#38 0x00007ffff7af4fe0 in thread_run (boot_raw=0x7fffe1756760) at ./Modules/_threadmodule.c:1092
#39 0x00007ffff7acad28 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:241
#40 0x00007ffff748c9eb in start_thread (arg=<optimized out>) at pthread_create.c:444
#41 0x00007ffff7510dfc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb)

Context for the issue:

This is currently blocking us from upgrading our dependency on NumPy to 1.26.0b1 for scikit-image in scikit-image/scikit-image#7101. It's been a very tricky thing to debug and I am a bit out of my depth now. :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions