-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix #13713 and re-fix the runtime events deadlock #13714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gasche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is consistent with our discussions in #13713.
|
Can we add the I think the test suite is showing its value, both in this case - and in #13580 - and hope to see label usage pick up on runtime changing PRs 🙏 |
| /* [user_events_lock] can be acquired during STW, so we must use | ||
| caml_plat_lock_blocking and be careful to avoid triggering any | ||
| STW while holding it */ | ||
| caml_plat_lock_blocking(&user_events_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gadmm I wonder if the following alternative fix would work:
- take the lock
non_blockingfor all the critical sections outside the STW code - in the STW section, do not take
users_events_lock, with a barrier after the user_events-using section to guarantee that no mutators run in parallel
More generally, I think that we can have resources that use non-blocking locks in mutators and are followed by a barrier in STW.
As noticed by @gasche, after @jmid's report of a deadlock at #13713, the channel finalizer cannot call
caml_plat_lock_non_blocking. This reverts the parts of the change that caused the bug at #13713 and better documents what happens here.Better reasoning about the interaction between
caml_plat_lock_non_blockingandcaml_plat_lock_blockingleads to tighter constraints about how they should be mixed. This shows that the previous patch did not fix the deadlock inside runtime events as thought, so I fix it differently here.