-
-
Notifications
You must be signed in to change notification settings - Fork 750
Deadlock - Ensure resumed flight tasks are still fetched #5426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c7e6e8b to
de1bcd9
Compare
|
FWIW the power user who originally reported this issue tried this PR and it looks promising to not deadlock any longer. |
crusaderky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmetic tweaks
74abd60 to
4138c3d
Compare
|
I see couldn't reproduce, yet |
jrbourbeau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fjetter
This is another deadlock introduced by #5160. In a nutshell, all tasks entering gather_deps must be handled / transitioned out of flight even though we're not trying to fetch all of the keys.
A second problem popping up is that the
TaskState.donewas not properly (re-)set causing dependencies to be fetched to be transitioned falsely.This also closes #5406
Follow up PR to establish an enum as the worker task state name: #5444