-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Tweak pollTimedOutRPCs thread synchronization #30355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Make processTimedOutFutures hold lock - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
Pull Request resolved: #30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 94476372 Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/)
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
Pull Request resolved: #30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 94487097 Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/)
mrshenli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Do we know why we hit failures in MacOS test?
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
Pull Request resolved: #30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 95326061 Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/)
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
jjlilley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
CircleCI build failures summaryAs of commit e29a7c8:
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. 6 new failures recognized by patternsThe following build failures don't appear to be due to upstream breakage:
|
| Job | Step |
|---|---|
| Build |
This comment was automatically generated by Dr. CI.
Follow this link to opt-out of these comments for your Pull Requests.
Please report bugs/suggestions on the GitHub issue tracker.
This comment has been revised 2 times.
- Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/) [ghstack-poisoned]
Pull Request resolved: #30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 95409528 Differential Revision: [D5516149](https://our.internmc.facebook.com/intern/diff/D5516149/)
|
|
||
| void ProcessGroupAgent::pollTimedOutRPCs() { | ||
| while (true) { | ||
| while (rpcRunning_.load()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to hold the lock here while checking rpcRunning? That's why there was the while(true) before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock was to avoid ::shutdown() noitifying when the watchdog thread is not waiting on the cond var, thus avoid the watchdog thread missing the notification.
With the predicate, we can allow the watch dog missing the notification.
Even if the watch dog thread misses the notification, the predicate will evaluate to false (since rpcRunning_ has already been set to False), the thread will not wait on the cond var, and gets back to the loop condition and decides to exit the loop.
| if (!rpcRunning_.load()) { | ||
| return; | ||
| if (shouldUpdateMinEndTime) { | ||
| continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this continue, I guess this means processing of the timed out futures is going to be skipped until we do another round of updating the minEndTime? If there is a timed out future, then there will be some delay in processing it, do we need to have this continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought we are closed on #30355 (comment).
Two conditions it will stop the thread from waiting on the cond var.
-
The cond var is notified. It means there is an earlier
endTimekey inserted into the map.
So it will go back to the beginning of the loop, update the thread-localminEndTimeand- if
now >= the latest minEndTime, it will process the futures immediately without waiting on the cond var. - if
now < the latest minEndTime, it will wait untilminEndTime
- if
-
Time is up.
now >= the latest minEndTime. It will wake from the wait and and process the timed out futures.
|
This pull request has been merged in 2488231. |
Summary: Pull Request resolved: pytorch#30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 95409528 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5516149 fbshipit-source-id: 4bb0bd59fa31d9bfaef9f07ac0126782da17f762
Stack from ghstack:
Differential Revision: D5516149