Skip to content

Lazy properties are not threadsafe #11221

@bmerry

Description

@bmerry

Description

The @lazyproperty decorator and @classproperty decorator with lazy=True have race conditions when accessing the cache, which can lead to sporadic errors when using astropy from multithreaded code e.g. dask. One such case is with the Unit parser.

The Python 3.8 cached_property (mentioned in #9036) is thread-safe, and looking at the implementation it seems to use a double-checked locking pattern so the approach shouldn't hurt performance. Let me know if you'd like me to make a PR.

Expected behavior

I expect to be able to parse unit strings safely from multi-threaded code.

Actual behavior

Running the script below sometimes leads to this exception (I run the script in a loop in the shell and it triggers within a second):

Traceback (most recent call last):
  File "/home/bmerry/src/astropy/astropy/units/format/generic.py", line 562, in _do_parse
    return cls._parse_unit(s, detailed_exception=False)
  File "/home/bmerry/src/astropy/astropy/units/format/generic.py", line 485, in _parse_unit
    raise ValueError()
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bmerry/src/astropy/astropy/units/format/generic.py", line 565, in _do_parse
    return cls._parser.parse(s, lexer=cls._lexer, debug=debug)
  File "/home/bmerry/src/astropy/astropy/extern/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
  File "/home/bmerry/src/astropy/astropy/extern/ply/yacc.py", line 1201, in parseopt_notrack
    tok = call_errorfunc(self.errorfunc, errtoken, self)
  File "/home/bmerry/src/astropy/astropy/extern/ply/yacc.py", line 192, in call_errorfunc
    r = errorfunc(token)
  File "/home/bmerry/src/astropy/astropy/units/format/generic.py", line 443, in p_error
    raise ValueError()
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bmerry/src/astropy/astropy/units/core.py", line 1850, in __call__
    return f.parse(s)
  File "/home/bmerry/src/astropy/astropy/units/format/generic.py", line 547, in parse
    result = cls._do_parse(s, debug=debug)
  File "/home/bmerry/src/astropy/astropy/units/format/generic.py", line 570, in _do_parse
    raise ValueError(f"Syntax error parsing unit '{s}'")
ValueError: Syntax error parsing unit 'm / (s)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./astropy-constants-threading.py", line 29, in <module>
    main()
  File "./astropy-constants-threading.py", line 26, in main
    raise excs[0]
  File "./astropy-constants-threading.py", line 11, in thread_func
    3 * u.Unit('m / (s)') + 1 * (u.km / u.s)
  File "/home/bmerry/src/astropy/astropy/units/core.py", line 1870, in __call__
    raise ValueError(msg)
ValueError: 'm / (s)' did not parse as unit: Syntax error parsing unit 'm / (s)' If this is meant to be a custom unit, define it with 'u.def_unit'. To have it recognized inside a file reader or other code, enable it with 'u.add_enabled_units'. For details, see https://docs.astropy.org/en/latest/units/combining_and_defining.html

I'm not totally sure that this is the cause (or is the only cause), but astropy.units.format.generic.Generic uses lazy class properties for the lexer and parser, and if several threads try to access the property at the same time, they could each try to build their own copy. I don't know about the lexer, but the comments at the top of astropy/extern/ply/yacc.py suggest that building parsers in parallel is a Bad Idea.

Steps to Reproduce

Run this code in bash with while ./astropy-constants-threading.py ; do :; done until it errors.

#!/usr/bin/env python3

import sys
import threading

import astropy.units as u


def thread_func(excs):
    try:
        u.Unit('m / (s)')
    except Exception as exc:
        excs.append(exc)


def main():
    sys.setswitchinterval(1e-6)
    excs = []
    threads = [threading.Thread(target=thread_func, args=(excs,))
               for i in range(32)]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    if excs:
        raise excs[0]


main()

System Details

Linux-5.4.0-58-generic-x86_64-with-glibc2.29
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0]
Numpy 1.19.1
astropy 4.3.dev439+g4b242e6f5
Scipy 1.5.2
Matplotlib 3.3.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions