-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Reduce recovery times caused by saturating the cluster controller #2430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rs to avoid having a saturated cluster controller recruit a master without all available workers
…ruitment over sending serverDBInfo
…luster controller if possible
…e part of the new generation
| for(int i = 0; i < oldLogRouters.size(); i++) { | ||
| result.oldLogRouters.push_back(oldLogRouters[i].interf); | ||
| if(req.maxOldLogRouters > 0) { | ||
| auto oldLogRouters = tlogs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be assuming that req.maxOldLogRouters is less than or equal to tlogs.size+1. Either that, or we're just ignoring maxOldLogRouters.
Can we make that an ASSERT if that's the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is already written so that it can recruit multiple log routers on the same worker, so the maxOldLogRouters was a suggested maximum number of workers, but returning fewer is okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern was the opposite -- namely what if maxOldLogRouters is smaller than the number of logs. If such a thing were possible, this would result in there being more old log routers than the max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the code, the list of workers does not determine the number of log routers, so in that case it would just ignore the additional interfaces
| } | ||
| if(foundCC) { | ||
| result.oldLogRouters.push_back(oldLogRouters.back().interf); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this logic means that we will always have 1 fewer old log router than tlogs (except when there's only 1)? Is there any reason not to use all of the tlogs when none of them are the cluster controller?
If what's written is the intent, the else block could be simplified to something like:
for(int i = 0; i < oldLogRouters.size() && result.oldRouters.size() < oldLogRouters.size() - 1; ++i) {
if(oldLogRouters[i].interf.locality.processId() != clusterControllerProcessId) {
result.oldLogRouters.push_back(oldLogRouters[i].interf);
}
}
| for(auto& it : rep.oldLogRouters) { | ||
| self->db.requiredAddresses.insert(it.address()); | ||
| if( it.tLog.getEndpoint().addresses.secondaryAddress.present() ) self->db.requiredAddresses.insert(it.tLog.getEndpoint().addresses.secondaryAddress.get()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this loop is written a bunch of times, it might be a good candidate for extracting into a function.
code cleanup
No description provided.