Skip to content

UfuncHelpers is not thread-safe #11220

@bmerry

Description

@bmerry

Description

The UfuncHelpers class is not thread-safe, and this can occasionally lead to strange errors when using astropy from multi-threaded code (e.g. with Dask).

Expected behavior

Multiple threads should be able to call ufuncs simultaneously.

Actual behavior

Errors like this sporadically occur:

  File "./astropy-ufunc-threading.py", line 13, in thread_code
    erfa.s2c(1 * u.rad, 2 * u.rad)
  File "/home/bmerry/work/sdp/env3/lib/python3.8/site-packages/erfa/core.py", line 19788, in s2c
    c = ufunc.s2c(theta, phi)
  File "/home/bmerry/src/astropy/astropy/units/quantity.py", line 457, in __array_ufunc__
    converters, unit = converters_and_unit(function, method, *inputs)
  File "/home/bmerry/src/astropy/astropy/units/quantity_helper/converters.py", line 157, in converters_and_unit
    ufunc_helper = UFUNC_HELPERS[function]
  File "/home/bmerry/src/astropy/astropy/units/quantity_helper/converters.py", line 86, in __missing__
    raise TypeError("unknown ufunc {}.  If you believe this ufunc "
TypeError: unknown ufunc s2c.  If you believe this ufunc should be supported, please raise an issue on https://github.com/astropy/astropy

People have also seen errors about collections mutating while being iterated, which I suspect is closely related.

Steps to Reproduce

It's a little tricky to reproduce reliably since it relies on thread timing. There's a script below, but I've found the easiest way to make it reliable is to add sleeps into the UfuncHelpers code at strategic points.

#!/usr/bin/env python3

import threading
import time

import astropy.units as u
import erfa


def thread_code(excs):
    try:
        erfa.s2c(1 * u.rad, 2 * u.rad)
    except Exception as exc:
        excs.append(exc)


def main():
    excs = []
    threads = [threading.Thread(target=thread_code, args=(excs,))
               for i in range(32)]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    if excs:
        raise excs[0]


main()

The races I can immediate see are:

  • The modules property has a race between checking if there is a _modules attribute and returning it. That's probably harmless in most cases since the first registration will likely be done when first importing astropy and before launching threads.
  • The call to list(self.modules.items()) in __missing__ iterates the modules dictionary to construct the list, and at the same time another thread might mutate it (possibly with register_module, but more likely with import_module which pops).
  • import_module removes the module from the list before updating the helper dictionary. If another thread calls __missing__ in that gap, it won't be able to find a helper (leading to the exception above).
  • __setitem__ modifies both UNSUPPORTED and the helper dictionary. There are probably some race conditions if __missing__ is called in the middle of that process.

Recommendations

I think the following should address most of the issues. It still depends on Python treating accesses to the core dict class as atomic, which as far as I know is an implementation detail of CPython but not guaranteed anywhere:

  1. Add a lock, that is held during all the member functions. That shouldn't affect the common case (a ufunc for which the helper is already registered) because this __getitem__ won't take the lock.
  2. Give the class a constructor and initialise modules there, instead of doing it lazily with a property.

Let me know if you'd like a PR.

System Details

Linux-5.4.0-58-generic-x86_64-with-glibc2.29
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
Numpy 1.19.1
astropy 4.3.dev439+g4b242e6f5
Scipy 1.5.2
Matplotlib 3.3.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions