Skip to content

candidate fix for deadlock#7

Merged
rhettinger merged 1 commit intorhettinger:cached_property_lockingfrom
Erotemic:crall/fix_cached_property_deadlock
Apr 16, 2022
Merged

candidate fix for deadlock#7
rhettinger merged 1 commit intorhettinger:cached_property_lockingfrom
Erotemic:crall/fix_cached_property_deadlock

Conversation

@Erotemic
Copy link
Copy Markdown

@Erotemic Erotemic commented Aug 31, 2021

Candidate fix for deadlock proposed on twitter:

https://twitter.com/erotemic/status/1431081138332835844

https://twitter.com/erotemic/status/1431118636568268800

The reasoning is as follows:

In the original code you have:

        cache = instance.__dict__

        key = id(self)
        this_thread = get_ident()
        with self.updater_lock:
            reentrant = self.updater.get(key) == this_thread
            wait = self.updater.setdefault(key, this_thread) != this_thread

        if wait:
            with self.cv:
                while key in self.updater:
                    self.cv.wait()

        val = cache.get(self.attrname, _NOT_FOUND)
        if val is not _NOT_FOUND:
            # 
            try:
                val = self.func(instance)
                cache[self.attrname] = val
            except Conditions: 
                # 

        # not writing all conditions for space-reasons
        # run if there is an error, unless we are reentrant and there is no error.
        # run at the end of a non-reentrant call
        if not reentrant and <there-was-no-error>:
            with self.updater_lock:
                self.updater.pop(key, None)
            with self.cv:
                self.cv.notify_all()

So say thread 1, 2, and 3 with the same instance id key are running.

  • thread 1, 2, and 3 start running but thread 1 makes it past self.updater_lock first, and it updates
  • updater will now contain key (i.e. the instance-id)
  • thread1 releases the lock, with wait set to False, and thread2 and 3 get through the lock, but wait is set to True.
  • thread1 tries to compute and cache the value
  • thread2 and thread3 are waiting because the instance-id is in the updater.
  • thread2 aquires the self.cv lock and starts waiting on self.cv
  • thread1 encounters an error and pops key out of self.updater, and notifies self.cv, which thread2 is waiting on
  • thread2 see that key is no longer in the updater, it releases the self.cv lock
  • thread3 aquires the self.cv lock, sees it is not in the updater and does not enter the loop, so it releases self.cv
  • !!! thread2 and thread3 are now simultaneously in the protected part of the code and will both have a a cache miss
  • !!! thread2 and thread3 call self.func which may not be thread-safe (as in my MWE)
  • !!! Deadlock

In the new code we replace the if wait with:

        while wait:
            with self.cv:
                while this_thread != self.updater.get(key, this_thread):
                    self.cv.wait()
                with self.updater_lock:
                    reentrant = self.updater.get(key) == this_thread
                    wait = self.updater.setdefault(key, this_thread) != this_thread

It fixes this problem because before thread2 releases self.cv, it marks adds it's thread-id to updater, so thread3 will know that it should wait until another thread (in this case thread2) notifies self.cv.

I'm not sure about the second reentrant check, that might be superfluous, but I do think the above order of operations is a reason why the code deadlocks.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Oct 1, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Oct 1, 2021
@Erotemic
Copy link
Copy Markdown
Author

Erotemic commented Oct 1, 2021

@rhettinger Ping

@github-actions github-actions bot removed the stale label Oct 2, 2021
@github-actions
Copy link
Copy Markdown

github-actions bot commented Nov 1, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Nov 1, 2021
@Erotemic
Copy link
Copy Markdown
Author

Erotemic commented Nov 1, 2021

@rhettinger Ping to prevent staleness (let me know if you'd like me to close this)

@github-actions github-actions bot removed the stale label Nov 2, 2021
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 2, 2021

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Dec 2, 2021
@rhettinger rhettinger merged commit 6530d93 into rhettinger:cached_property_locking Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants