Skip to content

Conversation

@jarrodmillman
Copy link
Contributor

@jarrodmillman jarrodmillman commented Aug 18, 2023

@jarrodmillman jarrodmillman added the 🔧 type: Maintenance Refactoring and maintenance of internals label Aug 18, 2023
@jarrodmillman jarrodmillman requested a review from lagru August 18, 2023 17:30
@lagru
Copy link
Member

lagru commented Aug 19, 2023

Is it possible that gufunc (in this case _umath_linalg.eigvalsh_lo or something in that machinery is not thread-safe? I am struggling to find a cause for this on our side.

@seberg, I hope it's okay to ping you in the hope that you have some insight or know whom to ask about this. 🙏 E.g. might numpy/numpy@7508afd be responsible?

@lagru
Copy link
Member

lagru commented Aug 19, 2023

I'll try my luck with git bisect later. Perhaps I can identify a commit on our or NumPy's side. Otherwise I am out of ideas.

@lagru
Copy link
Member

lagru commented Aug 22, 2023

An even more minimal reproducing example:

import numpy as np
from concurrent.futures import ThreadPoolExecutor

assert np.__version__ == '1.26.0b1'

rng = np.random.default_rng(32)
matrices = (
    rng.random((5, 10, 10, 3, 3)),
    rng.random((5, 10, 10, 3, 3)),
    # rng.random((5, 10, 10, 3, 3)),
)

with ThreadPoolExecutor(max_workers=None) as ex:
    list(ex.map(lambda m: np.linalg.eigvalsh(m), matrices,))

on the pre-build NumPy 1.26.0b1, I get one of these errors:

  • free(): invalid pointer
    free(): invalid pointer
    
    Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
    
  •   File "/home/lg/.local/lib/venv/skimagedev/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 1181, in eigvalsh
      w = gufunc(a, signature=signature, extobj=extobj)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ValueError: On entry to DSYEVD parameter number 8 had an illegal value
    
  • Sometimes it even passes! 🙈

When building NumPy 1.26.0b1 from source with python setup.py build_ext --inplace -j 4 it passes all the time. So git bisect is no use right now.

Maybe something todo with CBLAS etc.? Not sure how to read the output of numpy.show_config().

@lagru
Copy link
Member

lagru commented Aug 23, 2023

Reported upstream in numpy/numpy#24512.

@lagru lagru added the ⬆️ Upstream Involves an upstream project label Aug 23, 2023
@lagru
Copy link
Member

lagru commented Aug 24, 2023

Just noticed, the failing test in eb97c5e is a different one

def test_trainable_segmentation_oo():

compared to the one mentioned in #6970 (comment). But it's the same function np.linalg.eigvalsh that fails. Though I can't reproduce this failure locally. Only the previous one fails for me locally.

@seberg
Copy link
Contributor

seberg commented Sep 4, 2023

Is it possible that gufunc (in this case _umath_linalg.eigvalsh_lo or something in that machinery is not thread-safe? I am struggling to find a cause for this on our side.

Sorry, just on the process of coming back from vacation. I am very sure that the lapack lite stuff is not threadsafe, but if you install wheels they shouldn't be using it, the np.show_runtime() would be useful. You may be using it if you install NumPy yourself locally though.
(show_runtime() is often more helpful than show_config())

Next, the warning state may be unsafe, but this doesn't seem the problem (fixed on NumPy main finally). I am not aware there are other threading issues and I somewhat doubt it, but neither am I quite sure there are not. (Or whether openblas might have issues.)

@seberg
Copy link
Contributor

seberg commented Sep 4, 2023

Ah, hand't seen the NumPy issue where some nice sleuthing found that it is indeed the lapack-lite being used (due to build problems) which were fixed.

@jarrodmillman
Copy link
Contributor Author

Copy link
Member

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@jarrodmillman jarrodmillman merged commit 5fb4d72 into scikit-image:main Sep 7, 2023
@stefanv stefanv added this to the 0.22 milestone Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⬆️ Upstream Involves an upstream project 🔧 type: Maintenance Refactoring and maintenance of internals

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants